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Abstract 

Modern  non-invasive  brain  imaging  technologies,  such  as  diffusion  weighted 
magnetic  resonance  imaging  (DWI),  enable  the  mapping  of  neural  fiber  tracts 
in  the  white  matter,  providing  a  basis  to  reconstruct  a  detailed  map  of  brain 
structural  connectivity  networks.  Brain  connectivity  networks  differ  from 
random  networks  in  their  topology,  which  can  be  measured  using  small  world- 
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ness,  modularity,  and  high-degree  nodes  (hubs).  Still,  little  is  known  about 
how  individual  differences  in  structural  brain  network  properties  relate  to 
age,  sex,  or  genetic  differences.  Recently,  some  groups  have  reported  brain 
network  biomarkers  that  enable  differentiation  among  individuals,  pairs  of  in¬ 
dividuals,  and  groups  of  individuals.  In  addition  to  studying  new  topological 
features,  here  we  provide  a  unifying  general  method  to  investigate  topologi¬ 
cal  brain  networks  and  connectivity  differences  between  individuals,  pairs  of 
individuals,  and  groups  of  individuals  at  several  levels  of  the  data  hierarchy, 
while  appropriately  controlling  false  discovery  rate  (FDR)  errors.  We  apply 
our  new  method  to  a  large  dataset  of  high  quality  brain  connectivity  net¬ 
works  obtained  from  High  Angular  Resolution  Diffusion  Imaging  (HARDI) 
tractography  in  303  young  adult  twins,  siblings,  and  unrelated  people.  Our 
proposed  approach  can  accurately  classify  brain  connectivity  networks  based 
on  sex  (93%  accuracy)  and  kinship  (88.5  %  accuracy).  We  find  statistically 
significant  differences  associated  with  sex  and  kinship  both  in  the  brain  con¬ 
nectivity  networks  and  in  derived  topological  metrics,  such  as  the  clustering 
coefficient  and  the  communicability  matrix. 

Keywords:  Anatomical  brain  connectivity,  complex  networks,  diffusion 
weighted  MRI,  topological  analysis,  hierarchical  analysis,  false  discovery 
rate,  sex  and  kinship  brain  network  differences. 
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1.  Introduction 


2  Modern  non-invasive  imaging  technologies  such  as  Diffusion  Weighted 

3  Magnetic  Resonance  imaging  (DWI)  make  it  possible  to  estimate  the  lo- 

4  cal  orientation  of  neural  fiber  bundles  in  the  white  matter,  providing  rcli- 

5  able  anatomical  information  on  brain  connectivity  and  anatomical  networks 
e  (Iturria-Medina  et  ah,  2007;  Hagmann  et  ah,  2008,  2007;  Gigandet  et  ah, 

7  2008;  Bullmore  and  Bassett,  2010;  Bullmore  and  Sporns,  2009;  Bassett  et  ah, 

8  2011).  Topological  properties  of  complex  networks,  such  as  those  describing 

9  brain  connectivity,  have  been  analyzed  and  compared  to  random  networks 

10  using  traditional  (Rubinov  and  Sporns,  2010;  Boccaletti  et  ah,  2006;  Sporns 

11  and  Kotter,  2004;  Onncla  et  ah,  2005;  Blondel  et  ah,  2008)  and  new  topolog- 

12  ical  metrics  (Easley  and  Klcinberg,  2010;  Lohmann  et  ah,  2010;  Shepelyan- 

13  sky  and  Zhirov,  2010;  Bullmore  and  Bassett,  2010;  Bassett  et  ah,  2010,  2011; 

14  Estrada,  2010;  Estrada  and  Higham,  2010).  Still,  relatively  little  is  known 

15  about  how  functional  and  structural  brain  networks  differ  between  different 
is  populations,  and  how  their  properties  are  associated  with,  for  example,  age, 

17  sex,  and  genetic  factors.  Large  datasets,  as  presented  here,  are  vital  for  mak- 

18  ing  robust  statements  about  network  properties  and  factors  that  consistently 

19  affect  them. 

20  Recent  work  has  identified  effects  of  sex,  age,  heritability,  and  neurologi- 

21  cal  disorders  on  some  aspects  of  brain  networks  derived  from  structural  and 

22  functional  MRI.  Pattern  recognition  methods,  such  as  feature  selection,  di- 

23  mension  reduction,  and  classification,  have  been  used  to  predict  brain  matu- 

24  rity  (Dosenbach  et  ah,  2010;  Thomason  et  ah,  2011)  and  activity  (Richiardi 

25  et  ah,  2010)  from  functional  MRI  (fMRI),  and  also  the  effects  of  aging  on 
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brain  connectivity  measured  from  DWI  scans  (de  Boer  et  al.,  2011).  In  re- 

27  cent  work,  we  identified  significant  sex  and  genetic  differences  using  network 

28  data  at  the  edge  (node-to-node  connectivity)  level,  from  Diffusion  Tensor 

29  Imaging  (DTI)  (Jahanshad  et  al.,  2010)  and  High  Angular  Resolution  Dif- 

30  fusion  Imaging  (HARDI)  scans  (Jahanshad  et  ah,  2011).  In  general,  these 

31  anatomical  studies  create  a  connectivity  matrix  that  describes  the  proportion 

32  of  detected  brain  fibers  that  interconnect  all  pairs  of  regions,  taken  from  a 

33  set  of  regions  of  interest.  This  results  in  a  matrix  of  connectivity  values,  that 

34  can  be  treated  as  an  N  x  N  image  and  analyzed  using  voxel-based  statistical 

35  analysis  approaches  (Jahanshad  et  ah,  2011).  Additional  studies  have  re- 

36  ported  age  and  sex  differences  in  DWI  data  and  in  global  topological  metrics 

37  (Gong  et  ah,  2009);  genetic  effects  (Fornito  et  ah,  2011).  Abnormalities 

38  in  patients  with  schizophrenia  (Rubinov  and  Bassett,  2011)  have  also  been 

39  reported  in  connectivity  studies  using  fMRI. 

40  Here  we  propose  a  unifying,  robust  and  general  method  to  investigate 

41  brain  connectivity  differences  among  individuals,  pairs  of  individuals,  and 

42  groups  of  individuals  (classes),  at  several  levels  of  the  network  hierarchy: 

43  global,  node,  and  node-to-node  or  network  subgraphs.  We  use  robust  pat- 

44  tern  recognition  techniques  to  identify  brain  connectivity/ network  differences 

45  at  the  individual  level  (which  also  includes  pairs  of  individuals).  We  also 

46  describe  families  of  hypothesis  tests  to  identify  differences  at  the  group  or 

47  class  level.  We  apply  this  method  to  a  large  dataset  of  high  quality  brain 

48  connectivity  networks,  obtained  from  HARDI.  This  allows  us  to  study  orga- 

49  nizational  differences  between  the  human  brain  and  random  networks,  and 

50  brain  connectivity  differences  associated  with  sex  and  kinship. 


4 


Our  method  has  the  following  unique  characteristics: 


•  Robust  feature  selection  using  Support  Vector  Machines  (SVMs)  and 
n-fold  cross-validation. 

•  Robust  overall  classification  performance  evaluation  using  n-fold  cross- 
validation  and  permutation  tests. 

•  Hierarchical  analysis  of  brain  connectivity  network  differences,  simul¬ 
taneously  studying  the  networks  at  multiple  structural  levels. 

•  Robust  overall  control  of  the  false  discovery  rate  (FDR)  error,  especially 
with  hierarchies  of  multiple  families  of  hypothesis  tests. 

•  Analysis  of  a  large  high  quality  dataset  that  involves  a  robust  normal¬ 
ization  step. 

Using  this  method,  we  set  out  to  answer  the  following  questions  (research 
lines): 

1.  Can  we  classify  individuals  in  terms  of  sex  or  pairs  of  individuals  in 
terms  of  kinship  using  the  HARDI-derived  connectivity  matrices? 

2.  Can  we  classify  individuals  in  terms  of  sex  or  pairs  of  individuals  in 
terms  of  kinship  using  topological  measures  of  the  associated  network 
digraphs? 

3.  Are  there  any  differences  in  the  connectivity  matrices  attributable  to 
sex  differences  or  kinship? 

4.  Do  brain  connectivity  networks  and  random  networks  differ  in  topol- 


73  5.  Is  some  proportion  of  the  variance  in  brain  network  topology  attributable 

74  to  sex  or  kinship? 

75  This  study  of  sex  and  kinship  from  connectivity  networks  illustrates  the 

76  framework  and  address  key  biological  questions. 

77  The  topological  metrics  considered  here  can  be  arranged  in  a  hierarchical 

78  tree,  from  global  to  node-to-node  (Figure  1).  Network  differences  at  the 

79  individual  level  (including  pairs  of  individuals)  are  covered  by  the  proposed 
so  research  lines  1  and  2.  Research  lines  3  and  5  refer  to  class  (sex  and  kinship) 
si  properties.  We  also  look  for  global  topological  differences  between  real  and 

82  random  networks,  research  line  4,  as  these  have  been  frequently  reported 

83  in  the  literature  (Iturria-Medina  et  ah,  2007;  Gong  et  ah,  2009;  Bassett 

84  et  ah,  2010;  Fornito  et  ah,  2011;  Bassett  et  ah,  2011).  Here,  we  study  brain 

85  connectivity  differences  using  a  wide  variety  of  traditional  and  recent  global, 

86  cortical  (node),  and  inter-cortical  (node  to  node)  topological  metrics  not  used 

87  before  on  a  single  large  scale  study  of  high  quality  diffusion  MRI  data. 

ss  Our  relatively  large  number  of  high  quality  diffusion  MRI  data  allows  us 

89  to  consider  more  related  individuals  than  have  been  studied  before  for  ana- 

90  lyzing  structural  connectivity.  We  consider  all  possible  pair-wise  comparisons 

91  between  the  different  kinships. 

92  The  rest  of  the  paper  is  organized  as  follows:  Section  2  describes  the  diffu- 

93  sion  MRI  data  we  analyze,  we  describe  how  the  data  is  processed  to  produce 

94  the  anatomical  brain  connectivity  information  and  networks.  Section  3  in- 

95  troduces  the  questions  we  address  and  our  proposed  approach  using  robust 

96  pattern  recognition  methods  and  multiple  hypothesis  testing,  while  control- 

97  ling  the  FDR.  Section  4  reports  results  for  sex  and  kinship  classification 
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98  based  on  the  brain  connectivity  matrices  and  network  topology  measures. 

99  Section  4  also  presents  results  of  hypothesis  tests  on  the  brain  connectivity 

100  and  brain  topological  network  differences  due  to  sex  and  kinship,  as  well  as 

101  topological  differences  between  human  and  random  brain  networks.  Section 

102  5  discusses  the  results,  and  some  caveats  and  limitations.  Section  6  presents 

103  the  conclusions  of  this  work. 

104  2.  Estimation  of  Brain  Structural  Connectivity 

105  2.1.  Diffusion  MRI  Data  Acquisition  and  Processing 

ioo  The  raw  data  set  consists  of  4  Tesla  HARD1  and  standard  Tl- weighted 

107  structural  MRI  images,  for  303  individuals  (193  women  and  110  men),  be- 

108  tween  20  and  30  years  old  (mean  age:  23.5  ±  1.9  SD  years).  From  these 

109  subjects,  we  are  able  to  form  different  pair-wise  kinship  relationships  be- 
no  tween  identical  twins  (50),  non- identical  multiples  (64  non-identical  twins 
in  and  a  non-identical  triplet,  forming  67  pair-wise  relationships),  and  non-twin 
ii2  siblings  (35). 1  In  addition,  there  are  35  unrelated  individuals,  from  whom  we 
n3  can  obtain  (35  x  34)/2  =  595  pairs  of  unrelated  people,  but  we  only  choose 
ii4  at  random  100  of  them,  to  avoid  unbalancing  the  number  of  pairs  chosen 
ns  for  each  class.  In  summary,  we  have  50  +  67  +  35  +  100  =  252  pair-wise 
lie  relationships  for  our  kinship  analysis. 

n7  All  MR  images  were  collected  using  a  4  Tesla  Bruker  Medspec  MRI  scan- 
n8  ner,  with  a  transverse  electromagnetic  (TEM)  head  coil,  at  the  Center  for 

lrThe  group  of  non-twin  siblings  overlaps  the  group  of  twins  and  triplets,  since  an 
individual  can  have  2  or  more  siblings  that  are  twins  (or  triplets). 
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119  Magnetic  Resonance,  University  of  Queensland,  Australia.  Tl-weighted  im- 

120  ages  were  acquired  with  an  inversion  recovery  rapid  gradient  echo  sequence 

121  (TI/TR/TE  =  700/1500/3.35  ms;  flip  angle=8°;  slice  thickness  =  0.9  mm, 

122  with  a  2563  acquisition  matrix).  Diffusion- weighted  images  were  acquired 

123  using  single-shot  echo  planar  imaging  with  a  twice-refocused  spin  echo  se- 

124  quence  to  reduce  eddy-current  induced  distortions.  Imaging  parameters  were: 

125  TR/TE  =  6090/91.7  ms,  23  cm  FOV,  with  a  128  x  128  acquisition  matrix. 

126  Each  3D  volume  consisted  of  55  2 -mm  thick  axial  slices  with  no  gap,  and 

127  a  1.79  x  1.79mm2  in-plane  resolution.  We  acquired  105  images  per  subject: 

128  11  with  no  diffusion  sensitization  (i.e.,  bO  images)  and  94  diffusion- weighted 

129  (DW)  images  (b  =  1159  s/mm2)  with  gradient  directions  evenly  distributed 

130  on  the  hemisphere,  as  is  required  for  unbiased  estimation  of  white  matter 

131  fiber  orientations.  Scan  time  was  14.2  minutes.  Non-brain  regions  were  au- 

132  tomatically  removed  from  each  Tl-weighted  MRI  scan,  and  from  a  bO  image 

133  obtained  from  the  DWI  data  set  using  the  BET  FSL  tool.2  A  trained  neu- 

134  roanatomical  expert  manually  edited  the  Tl-weighted  scans  to  further  refine 

135  the  brain  extraction.  All  Tl-weighted  images  were  linearly  aligned  using 

136  FSL  (with  9  DOF3)  to  a  common  space,  (Holmes  et  al.,  1998),  with  1mm 

137  isotropic  voxels  and  a  220  x  220  x  220  voxel  matrix. 

138  Raw  diffusion-weighted  images  were  corrected  for  eddy  current  distortions 

139  using  the  eddy  currents  distortions  correction  FSL  tool.  For  each  subject, 
mo  the  11  non- diffusion- weighted  images  (with  no  diffusion  sensitization)  were 

2  htt  p :  /  /  fsl .  fmr  ib .  ox.  ac .  uk/fsl/ 

3The  expected  deformations  are  only  translation,  rotation,  and  anisotropic  scaling;  no 
shearing  between  Tls  of  the  same  subject. 


141  averaged  and  resampled  and  linearly  aligned  to  a  down-sampled  version  of 

142  the  same  subject,  corresponding  to  a  Tl-weighted  anatomical  image  (110  x 

143  110  x  110,  2  x  2  x  2 mm).  Averaged  bO  maps  were  then  elastically  registered 

144  to  the  structural  scan  using  an  inverse  consistent  registration  algorithm  with 
ms  a  mutual  information  cost  function,  (Leow  et  ah,  2005),  to  compensate  for 
we  high-field  echo-planar  imaging  (EPI)  induced  susceptibility  artifacts.  This 
147  elastic  registration  further  refines  the  linear  intra-subject  registration. 

ms  Thirty-five  cortical  labels  per  hemisphere  (Table  SI,  in  the  supplementary 

149  material)  were  automatically  extracted  from  all  high  resolution  aligned  Tl- 

150  weighted  structural  MRI  scans  using  FreeSurfer4  (Fischl  et  ah,  2004).  The 

151  output  labels  from  FreeSurfer  (1-35)  for  each  hemisphere  were  combined  into 

152  a  single  image.  As  a  linear  registration  is  performed  within  the  software, 

153  the  resulting  Tl-weighted  images  and  cortical  models  were  aligned  to  the 

154  original  T1  input  image  space  and  down-sampled  using  nearest  neighbor 

155  interpolation  (to  avoid  intermixing  of  labels)  to  the  space  of  the  DWIs.  To 

156  ensure  tracts  would  intersect  labeled  cortical  boundaries,  labels  were  dilated 

157  simultaneously  (to  prevent  overlap)  with  an  isotropic  box  kernel  of  5  voxels. 

158  Tractography  is  performed  by  randomly  choosing  seed  voxels  of  the  white 

159  matter  with  a  prior  probability  based  on  the  fractional  anisotropy  (FA)  value 
leo  derived  from  the  diffusion  tensor  model  (Basser  and  Pierpaoli,  1996).  We 
lei  use  a  global  probabilistic  approach  inspired  by  the  voting  procedure  of  the 

162  popular  Hough  transform  (Gonzales  and  Woods,  2008;  Duda  and  Hart,  1972). 

163  The  tractography  algorithm  tests  a  large  number  of  candidate  3D  curves 

4http:  / /surfer. nmr.mgh.harvard.edu/ 
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164  originating  from  each  seed  voxel,  assigning  a  score  to  each,  and  returns  the 

165  curve  with  the  highest  score  as  the  estimated  pathway.  The  score  of  each 
lee  curve  is  computed  from  the  agreement  between  the  estimated  curve  and 
i6r  fiber  orientations  as  derived  from  the  Orientation  Distribution  Functions 
i68  (ODFs)  (Aganj  et  al.,  2011).  At  each  voxel  of  the  DWI  dataset,  ODFs  are 
lea  computed  using  the  normalized  and  dimensionless  ODF  estimator,  derived 

170  for  HARDI  in  Aganj  et  al.  2011,  which  is  mathematically  more  accurate  and 

171  also  outperforms  the  original  Q-Ball  Imaging  (QBI)  definition  (Tuch,  2004), 

172  e.g.,  it  improves  the  resolution  of  multiple  fiber  orientations  (Aganj  et  ah, 

173  2011). 

174  As  it  is  an  exhaustive  search,  this  algorithm  avoids  entrapment  in  local 

175  minima  within  the  discretization  resolution  of  the  parameter  space.  Further- 

176  more,  the  specific  definition  of  the  candidate’s  tract  score  attenuates  noise 

177  by  integrating  the  real- valued  local  votes  derived  from  the  diffusion  data.5 

178  Further  details  of  the  method  can  be  found  in  (Aganj  et  ah,  2011). 

179  Elastic  deformations  obtained  from  the  EPI  distortion  correction,  map- 

180  ping  the  average  bO  image  to  the  Tl-weighted  image,  were  then  applied  to 

181  the  tracts  3D  coordinates.  To  avoid  considering  small  noisy  tracts,  tracts 

182  with  fewer  than  15  fibers  were  filtered  out. 

5In  the  near  future,  this  algorithm  will  be  released  through  the  Neuroimaging  Informat¬ 
ics  Tools  and  Resources  Clearinghouse  (NITRC)  online  repository,  and  is  available  upon 
request. 
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2.2.  Computing  Connectivity  Matrices  and  Brain  Networks 

From  the  cortical  labeling  and  tractography,  symmetric  matrices  of  con¬ 
nectivity  (70  x  70)  are  built,  one  per  subject.  Each  entry  contains  the  number 
of  fibers  connecting  each  pair  of  cortical  regions  (Table  SI)  within  and  across 
each  brain  hemisphere.  Connectivity  matrices  based  on  fiber  counts  should 
always  be  normalized  to  the  [0,  1]  range,  as  the  number  of  fibers  detected 
varies  from  individual  to  individual.  In  addition,  there  is  a  bias  in  the  number 
of  fibers  detected  by  tractography  that  start  or  end  in  any  given  cortical  re¬ 
gion,  due  to  fiber  crossings,  fiber  tract  length,  volume  of  the  cortical  region, 
and  proximity  to  large  tracts  like  the  corpus  callosum  (Jahanshad  et  ah, 
2011;  Hagmann  et  ah,  2008,  2007;  Bassett  et  ah,  2011).  However,  there  is  no 
unique  way  to  normalize  the  fiber  tract  count  (Bassett  et  ah,  2011). 

We  decided  not  to  use  the  normalizations  proposed  in  (Hagmann  et  ah, 
2008,  2007;  Bassett  et  ah,  2011),  as  they  involve  geometric  measures  includ¬ 
ing  the  volume  of  the  cortical  regions  and  the  mean  path  length  of  fibers 
connecting  each  two  regions.  Instead,  we  considered  three  purely  topologi¬ 
cal  normalizations,  since,  as  in  (Gong  et  ah,  2009),  we  want  to  find  pure 
topological  network  differences  due  to,  e.g.,  sex  and  kinship: 


Wij  = 


wa  = 


Wij  = 


aij 


X]  7  aij  ai. 


Yl  j  aij 


(1) 

(2) 

(3) 


where,  atJ  represents  the  entries  in  the  original  fiber  count  matrix,  A,  and 
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202  Wij  the  entries  (weights)  of  the  now  normalized  70  x  70  connectivity  matrix, 

203  W . 

204  Equation  (1)  (used  in  our  previous  work,  Jahanshad  et  al.  2011)  nor- 

205  malizes  the  fiber  count  for  each  pair  of  regions  by  the  total  number  of  fibers 

206  in  the  entire  brain,  reducing  variability  among  the  connectivity  matrices  due 

207  to  differences  in  the  total  number  of  fibers  found.  In  practice,  this  normal- 

208  ization  can  provide  biased  weights,  since  it  does  not  take  into  account  that 

209  a  higher  number  of  fibers  will  be  found  in  some  regions,  e.g.,  in  the  vicinity 

210  of  the  corpus  callosum,  and  also  more  fibers  would  be  counted  in  cortical 

211  regions  with  larger  areas  (Hagmann  et  ah,  2008;  Bassett  et  ah,  2011). 

212  Equation  (3),  first  proposed  by  Behrens  et  ah  2007  in  the  context  of  trac- 

213  tography,  can  be  interpreted  as  the  probability  of  connecting  cortical  regions 

214  i  and  j,  given  that  there  are  al3  fibers  between  them  and  there  are  JA  anj 

215  fibers  available  on  cortical  region  i.  Equation  (2),  (Crofts  and  Higham, 

216  2009),  divides  the  number  of  fibers  between  any  two  cortical  regions  by  the 

217  geometric  mean  of  the  number  of  fibers  leaving  either  region.  The  assump- 

218  tion  here  is  stronger  than  that  of  Equation  (3),  as  it  assumes  the  same  total 

219  number  of  fibers  on  each  pair  of  brain  regions.  This  can  lead  to  bias  due  to 

220  large  differences  in  the  total  number  of  fibers  on  each  region  (locally),  but 

221  it  should  be  correct  on  average  (globally).  An  equivalent  normalization  was 

222  used  in  (Gong  et  ah,  2009),  where  instead  of  the  geometric  mean,  they  used 

223  an  arithmetic  mean,  averaging  Wi3  and  w3i  on  Equation  (3). 

224  Equations  (1)  and  (2)  lead  to  undirected  connectivity  graphs,  which  are 

225  typical  in  structural  brain  connectivity  analysis.  Equation  (3),  on  the  other 

226  hand,  leads  to  directed  graphs  (digraphs).  To  see  this,  note  that  in  general 
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227 


J2i  aij  J2jaiji  i-e-  the  total  number  of  fibers  on  cortical  regions  i  and  j 

228  can  be  different  on  either  side  of  the  connection,  hence,  in  general,  Wij  7^  Wji 

229  on  Equation  (3).  Normalizations  (l)-(3)  are  further  modified  as 

230  where  wtJ  is  defined  as  indicated  in  equations  (l)-(3),  in  order  to  reduce  the 

231  differences  among  different  connectivity  matrices  (different  subjects),  thereby 

232  making  max{wij}  =  1.  Equations  (2),  (3),  modulated  by  max{wij},  reduce 

233  significantly  the  mean  effect  of  brain  size  differences  between  men  and  women 

234  (see  the  regression  analysis  in  the  Appendix),  which  is  a  known  confounding 

235  factor  in  analyses  of  sex  differences  (Leonard  et  al.,  2008). 

236  Here,  we  work  with  the  normalization  provided  by  Equation  (3), 6  because 

237  it  reduces  the  effect  of  brain  size.  Connectivity  matrices  are  asymmetric  -  this 

238  coming  from  the  normalization  and  not  from  the  tractography  results.  This  is 

239  beneficial  as  it  uses  all  available  entries  in  the  matrix,  while  traditional  sym- 

240  metric  matrices,  as  obtained  from  the  other  two  normalizations,  only  use  half 

241  of  the  matrix  to  store  network  information.  This  extra  information  is  not  an 

242  artifact  of  the  normalization  -  it  provides  more  information  about  differences 

243  between  two  connected  brain  regions.  Two  cortical  regions  are  connected  by 

244  the  same  number  of  fibers,  but  the  proportion  of  fibers  dedicated  to  that 

245  particular  connection  can  be  very  different  within  each  cortical  region.  For 

246  instance,  consider  the  case  where  cortical  region  i  connects  exclusively  to 

247  region  j ,  but  region  j  connects  not  only  to  i,  but  also  to  many  other  regions. 

248  In  terms  of  probability  of  connection,  pij  =  1  ,pik  =  0,  k  ^  j,  since  i  connects 

6The  basic  method  introduced  later  for  analyzing  brain  networks,  in  particular  the 
features  for  undirected  networks  and  the  statistical  analysis,  can  still  be  applied  to  the 
other  possible  normalizations  as  well. 
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249 


exclusively  to  j  (pij  being  the  probability  of  connecting  region  i  with  region 

250  j).  However,  p3i  <  1,  and  pjk  f  0  for  some  k  regions,  satisfying  in  both  cases 

251  EiPij  =  J2jPjk  —  1  (ad  the  regions  must  be  connected),  hence,  ptl  Pji.  In 

252  the  general  case,  each  cortical  region  connects  to  a  different  number  of  other 

253  cortical  regions,  so  in  general,  pl}  pji,  as  on  Equation  (3).  We  consider 

254  that  capturing  this  asymmetry  in  the  connectivity  matrices  W  is  important, 

255  and  this  is  validated  in  the  experimental  results. 

256  In  summary,  we  derived  303,  one  per  subject,  normalized  connectivity 

257  (network)  70  x  70  matrices  W,  by  applying  probabilistic  tractography  to 

258  HARDI  at  4T.  These  matrices  provide  our  basis  for  studying  anatomical 

259  brain  connectivity,  as  described  next. 

260  3.  Methods 

261  The  research  lines  addressed  here  (see  the  Introduction)  are  independent 

262  as  they  answer  different  questions  and  there  is  no  interaction  or  inference 

263  among  them.  It  is  important  to  state  the  independence  of  these  research 

264  lines,  as  it  implies  that  there  is  no  need  for  an  overall  FDR  error  control,  other 

265  than  the  FDR  control  on  each  research  line  (Benjamini  and  Hochberg,  1995; 

266  Yekutieli,  2008).  The  Erst  two  research  lines  are  addressed  simultaneously 

267  using  robust  pattern  recognition  methods  that  extend  well  to  unobserved 

268  data  (Section  3.1).  The  last  three  research  lines  are  going  to  be  addressed 

269  using  statistical  hypothesis  testing  (non-parametric  bootstrap),  where  the 

270  corresponding  null  hypotheses  are  stated  as: 

271  1.  There  are  no  differences  in  the  connectivity  matrix.  Given  that  there 

272  are  Ofn2)  weights  on  a  connectivity  matrix  of  n  nodes,  there  are  0(n2) 
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273  local  null  hypothesis  to  be  tested,  one  for  each  connection,  forming  a 

274  large  family  of  hypothesis  testing.  As  n  =  70  in  our  case,  we  could 

275  have  up  to  4900  hypotheses  to  test  for  differences  in  the  connectivity 

276  matrices.7 

277  2.  There  are  no  global  topological  differences  between  real  networks  and 

278  random  networks.  In  general,  we  can  have  m  global  topological  metrics 

279  (see  Figure  1  and  Section  3.2  for  details),  forming  a  single  family  of 

280  hypothesis  testing. 

281  3.  There  are  no  topological  differences,  at  any  scale,  on  the  directed  net- 

282  works  due  to  sex  or  kinship  (Figure  1).  Hence,  we  have  m  hypotheses 

283  to  test  at  the  global  level,  possibly  m  families  of  hypothesis  at  the  node 

284  level  (one  for  each  global  hypothesis),  having  each  one  0(n),  n  =  70, 

285  null  hypothesis  to  test  for  differences  at  each  node,  and  several  families 

286  of  hypotheses  at  the  node-to-node  level,  where  each  family  corresponds 

287  to  a  topological  metric  at  the  node-to-node  level  (Figure  1),  and  each 

288  family  consists  of  0(n2)  hypothesis  to  test,  one  for  each  pair  of  nodes. 

289  The  first  two  null  hypotheses  require  only  a  single  (albeit  possibly  large) 

290  family  of  hypothesis  tests,  while  the  last  one  requires  several  families  of  hier- 

291  archically  related  hypothesis  tests,  where  families  of  hypotheses  at  the  node- 

292  to-node  level  can  consist  of  0(n2)  local  hypotheses  (up  to  4900  hypotheses 

293  in  our  case,  n  =  70). 

'Of  course,  we  only  look  for  statistically  significant  differences  where  the  number  of 
connections  detected  is  more  than  zero. 
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294  At  the  population  level,  we  consider  only  average  network  differences  in 

295  the  connectivity  matrix  (research  line  3,  see  Introduction),  or  in  the  topo- 

296  logical  metrics  of  the  associated  graphs  (research  line  5  in  the  Introduction), 

297  resulting  from  sex  and  kinship,  as  we  know  a  priori  that  the  variability 

298  between  the  connectivity  matrices  of  individuals  can  be  as  large  as  the  vari- 

299  ability  between  the  connectivity  matrices  within  the  same  group  (same  sex 

300  or  same  kinship  relationship)  -  an  observation  derived  both  from  previous 

301  studies,  (Bassett  et  al.,  2011),  and  from  our  own  dataset. 

302  We  consider  the  two  classes  women  and  men,  based  on  sex;  and  the 

303  four  classes  identical  twins,  non-identical  multiples,  non-twin  siblings  ,  and 

304  unrelated  individuals,  based  on  kinship  relationships.  These  are  used  for 

305  classification  at  the  individual  (including  pairs  of  individuals  for  kinship) 

306  level  and  for  hypothesis  testing  at  the  group  level. 

307  Onr  analysis  of  kinship  follows  previous  genetic  studies  of  brain  connectiv- 

308  ity  (Jahanshad  et  ah,  2011,  2010;  Rubinov  and  Bassett,  2011;  Fornito  et  ah, 

309  2011;  Thompson  et  ah,  2001).  One  traditional  line  of  analysis  in  genetic 

310  studies  uses  a  classical  twin  design  to  compute  intra-pair  (or  intra-class)  cor- 
3n  relations  between  measures  of  cortical  gray  matter  density  (Thompson  et  ah, 

312  2001),  connectivity  matrices  (Jahanshad  et  ah,  2011,  2010),  or  wavelets  rep- 

313  resenting  the  connectivity  matrices  (Fornito  et  ah,  2011),  however,  these 

314  correlation  operations  reduce  the  data  to  a  single  matrix  of  correlations,  and 

315  heritability  statistics  for  all  pairs  of  subjects  in  the  same  group. 

316  For  kinship  analysis,  we  work  with  the  absolute  value  of  the  differences 

317  in  the  connectivity  matrix  and  with  network  differences  in  the  topological 

318  metrics  considered,  between  pairs  of  individuals.  These  pair-wise  differences 
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319  are  differences  between  pairs  of  identical  twins,  differences  between  pairs 

320  of  non-identical  multiples,  differences  between  siblings  who  are  not  twins, 

321  and  finally  differences  between  pairs  of  unrelated  people.  We  use  pairwise 

322  differences  within  and  across  families,  as  they  allow  us  to  detect  genetically- 

323  mediated  effects  in  pairings  with  different  degrees  of  known  genetic  affinity 

324  (Thompson  et  ah,  2001). 

325  To  avoid  losing  pairs  of  subjects  in  the  kinship  analyses,  we  did  not  con- 

326  strain  the  pairwise  differences  between  individuals  to  be  of  the  same  sex, 

327  which  in  our  study  corresponds  approximately  to  half  the  non-identical  mul- 

328  tiples  considered.  The  statistical  power  of  the  tests  of  kinship  differences 

329  might  be  reduced  by  the  confounding  effects  of  sex  differences,  but  at  the 

330  same  time,  we  are  also  increasing  the  statistical  power  of  the  test  (Winer, 

331  1971),  by  considering  a  larger  number  of  pairwise  differences. 

332  3. 1 .  Classification 

333  Here,  we  want  to  classify  individual  brain  connectivity  networks  in  terms 

334  of  sex  (women  and  men)  and  pairs  of  individuals  in  terms  of  kinship,  using 

335  the  connectivity  matrices  or  the  associated  network  topology  metrics  at  the 

336  node  or  node-to-node  level. 

337  In  classification,  we  encounter  the  multiple  comparisons  problem  (MCP), 

338  which  arises  whenever  we  test  multiple  hypotheses  simultaneously.  If  we 

339  do  not  correct  for  this,  then  the  more  hypotheses  tested,  the  higher  the 

340  probability  of  obtaining  at  least  one  false  positive. 

341  This  can  be  dealt  with  in  classification  via  n-fold  cross-validation.  In 

342  fact,  cross-validation  can  be  more  effective  than  Bonferroni-type  corrections 

343  (Jensen  and  Cohen,  2000),  as  it  does  not  test  on  the  same  data  used  to  derive 
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344  the  model.  Here  we  use  10-fold  cross-validation,  a  good  trade-off  between 

345  robustness  to  unobserved  data  and  using  as  much  data  as  possible  to  train 

346  the  classifiers  (Refaeilzadeh  et  ah,  2009).  In  addition  to  cross-validation,  we 

347  also  use  permutation  tests  (see  Appendix  for  details),  to  non-parametrically 

348  evaluate  the  null  hypothesis  that  the  classifiers  might  have  obtained  good 

349  classification  accuracies  just  by  chance  (Ojala  and  Garriga,  2010).  In  this 

350  work,  we  use  Support  Vector  Machine  (SVM)  classifiers,  as  they  extend  well 

351  to  unobserved  data,  (Vapnik,  1998),  and  deal  with  the  MCP  problem  by 

352  reducing  the  number  of  comparisons  to  the  number  of  support  vectors. 

353  Given  the  high  dimensionality  (M"“ ,  n  =  70  nodes)  of  the  brain  connec- 

354  tivity  networks  and  associated  topological  metrics  consider  here  (see  Section 

355  3.2  for  their  full  description),  we  use  feature  selection  methods  to  reduce  the 

356  effective  dimensionality  of  the  data.  We  call  here  feature,  any  of  the  connec- 

357  tivity  or  topological  network  differences  at  the  node-to-node  and  single  node 

358  levels.  Feature  selection  methods  can  significantly  improve  classification  ac- 

359  curacy,  even  for  classifiers  that  exploit  the  higher  discrimination  possibilities 

360  in  high  dimensional  spaces,  such  as  SVMs  (Vapnik,  1998;  Guyon  and  Eliseeff, 

361  2003).  In  general,  there  are  three  methods  used  for  feature  selection:  filters, 

362  wrappers,  and  embedded  methods  (Guyon  and  Eliseeff,  2003).  Filter  rneth- 

363  ods  employ  a  ranking  criteria  such  as  the  Pearson  cross-correlation  (used 

364  for  example  in  Dosenbach  et  al.  2010),  Mutual  Information,  Fisher  criterion, 

365  and  so  on,  and  a  given  threshold  to  filter  out  low  ranked  features.  Wrap- 

366  pers  use  the  classifier  itself  to  evaluate  the  importance  of  each  feature  and 

367  explore  the  whole  feature  space  using  for  instance,  gradient  based  methods, 

368  genetic  algorithms  or  greedy  algorithms.  Filter  methods  are  very  fast  and 
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369 


independent  of  the  selected  classifier,  however,  they  can  lead  to  the  selec- 

370  tion  of  redundant  features  (Guyon  and  Eliseeff,  2003).  They  also  disregard 

371  features  with  relatively  small  individual  influence  that  can  potentially  have 

372  an  influential  effect  as  a  group.  Wrappers,  on  the  other  hand,  can  avoid 

373  redundant  features  and  identify  influential  subgroups  of  features.  However, 

374  they  are  computationally  intensive,  since  the  subset  feature  selection  prob- 

375  1cm  is  NP-hard  (Amaldi  and  Kami,  1998),  and  are  strongly  dependent  on 

376  the  classifier  used  (Guyon  and  Eliseeff,  2003).  Embedded  methods  also  use 

377  a  classifier  to  evaluate  the  importance  of  subgroup  of  features.  Hence,  they 

378  are  wrappers.  However,  they  provide  a  trade-off  between  other  wrappers  and 

379  filter  methods,  in  terms  of  computational  efficiency  and  reduced  number  of 

380  features,  since  they  introduce  a  penalty  term  that  enforces  small  number  of 

381  features  (Guyon  and  Eliseeff,  2003). 

382  An  alternative  to  feature  selection  methods  are  dimension  reduction  meth- 

383  ods  such  as  Principal  Components  Analysis  (PGA)  and  Independent  Compo- 

384  nent  Analysis  (IGA).  See  Hartmann  2006,  for  a  comparison  of  both  methods 

385  in  the  context  of  machine  learning.  Here,  we  preferred  feature  selection  meth- 

386  ods,  as  the  features  in  dimension  reduction  methods  are  in  general  functions 

387  of  the  original  features,8  and  cannot  be  associated  to  a  unique  “physical” 

388  feature  in  the  original  data  space.  In  particular,  we  use  the  SVM-based  em- 

389  bedded  feature  selection  algorithm  proposed  by  Guyon  et  al.  2002.  When 

390  selecting  features  with  a  classifier  there  is  a  risk  of  “double-dipping,”  i.e., 

391  training  the  feature  selection  algorithm  and  testing  it  with  the  same  data, 

8PCA  for  instance  is  a  projection  of  the  original  features  onto  the  matrix  eigen-space, 
and  hence  is  a  linear  combination  of  the  original  features. 
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392 


which  leads  to  unrealistic  high  accuracies  (over-fitting)  that  do  not  extend 

393  well  to  unseen  data  (Kriegeskorte  et  ah,  2009;  Refaeilzadeh  et  ah,  2009).  To 

394  avoid  this,  the  feature  selection  algorithm  uses  10-fold  cross-validation,9  se- 

395  lecting  the  features  that  contributes  more  to  classification,  but  that  are  also 

396  more  stable  across  the  different  cross-validation  sets  of  data  (Kriegeskorte 

397  et  ah,  2009;  Refaeilzadeh  et  ah,  2009).  In  the  proposed  framework,  feature 

398  selection  algorithms  extract  the  m  <C  n 2  most  relevant  features  from  the 

399  digraph  matrices  taken  as  high-dimensional  vectors  in  Mn  ,  n  =  70,  then  use 

400  the  m  selected  features  to  classify  the  reduced  features  in  Mm. 

401  We  tested  classification  performance  using  the  following  standard  mea- 

402  sures: 

403  •  The  overall  classification  accuracy. 

404  •  The  sensitivity  and  specificity.10 

405  •  The  balanced  error  rate  (BER),  which  corresponds  to  the  average  of 

406  the  errors  on  each  class. 

407  •  The  area  under  the  receiver  operating  characteristic  (ROC)  curve,  which 

408  measures  the  probability  that  the  classifier  can  actually  discriminate 

409  the  true  class  from  the  incorrect  one(s). 

9Training  with  90%  of  the  data  and  testing  on  the  remaining  10%,  and  repeating  the 

process  10  times  with  randomly  selected  training  and  testing  samples. 

10 As  it  is  usual  in  binary  classification,  we  report  sensitivity  and  specificity  for  women 

only,  given  that  the  sensitivity  for  men  is  numerically  the  same  as  the  specificity  for  women 
and  the  specificity  for  men  is  numerically  the  same  as  the  sensitivity  for  women. 
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410  •  The  kappa  statistic,  which  measures  the  agreement  of  the  classifier  with 

411  the  labels  taking  into  account  the  probability  that  the  agreement  has 

412  been  obtained  by  chance.  It  uses  the  confusion  matrix  to  make  this 

413  assessment. 

414  •  Permutation  tests  p-values,  which  non-parametrically  assess  the  prob- 

415  ability  that  the  classification  results  were  obtained  by  chance  by  esti- 

416  mating  the  null  hypothesis  distribution. 

417  For  space  considerations,  the  confusion  matrices  were  not  included  here,  and 

418  can  be  found  in  the  supplementary  material. 

419  3.2.  Topological  Metrics 

420  In  addition  to  studying  node-to-node  connections,  e.g.,  just  the  entries 

421  of  the  matrix  W  as  stand-alone  features,  we  would  like  to  consider  features 

422  that  indicate  higher  levels  of  interactions  between  the  studied  regions. 

423  As  we  do  not  know  a  priori  which  topological  metrics  would  provide  sta- 

424  tistically  significant  differences  between  different  classes  of  brain  connectivity 

425  networks,  we  have  to  limit  ourselves  to  a  few  selected  ones,  to  control  the 

426  FDR  error  within  each  research  line.  We  consider  11  representative  topolog- 

427  ical  metrics  at  the  global,  node,  and  node-to-node  level  (Figure  1).  While 

428  some  have  been  studied  for  brain  networks,  all  these  topological  features 

429  have  found  relevance  in  other  disciplines,  such  as  social  networks  (Easley 

430  and  Klcinberg,  2010),  and  provide  interesting  insights  into  the  overall  orga- 

431  nization  of  the  brain. 
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3.2.1.  Node-to-node  Level 

At  the  node-to-node  level  we  consider  the  edge  betweenness  centrality 
(. EBC ),  a  new  subgraph  based  centrality  (. SGC ),  and  the  communicability 
measures  {COM)  (Estrada  and  Higham,  2010;  Estrada,  2010).  The  weighted 
edge  betweenness  centrality  is  defined  as  (Rubinov  and  Sporns,  2010), 

EBCn  =  V  (4) 

hk  phk 

where  pkk  is  the  number  of  shortest  paths  between  nodes  h  and  k  that  contain 
edge  ij  and  phk  is  the  number  of  shortest  paths  between  h  and  k.  EBC 
measures  the  fraction  of  all  shortest  paths  in  the  network  that  contain  edge 
ij,  and  hence,  the  importance  of  each  edge  in  the  communication  among 
cortical  regions. 

To  understand  the  subgraph  centrality  {SGC)  and  communicability  {COAL) 
measures  (Estrada  and  Higham,  2010;  Estrada,  2010),  let  us  first  decompose 
the  connectivity  matrix  as  W  =  Ayy  +  W,  where  Aye  is  a  diagonal  matrix, 
with  non-zero  entries  corresponding  to  the  diagonal  of  W,  and  W  is  the  re¬ 
sulting  matrix  of  making  zero  the  diagonal  of  W.  Notice  that  Aye  contains 
the  self-connections  of  each  node,  while  W  the  connections  between  each  pair 
of  nodes.  Let  us  define  (Estrada  and  Higham,  2010;  Estrada,  2010), 


k=  1 


wk 

Id 


wihlwhlh2  ...whk  -i  jy 


(5) 


where,  /„  is  the  identity  matrix  of  size  n  x  n  and  we  have  used  the  definition 
of  the  exponential  of  a  matrix.  The  product  Wi^Whih^  ■  ■  ■  w hk-d  measures  the 
strength  of  the  walk  {i,  hi, ,  hk-i,j)  of  length  k,  between  nodes  i  and  j.  A 
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452  walk  is  a  list  of  connected  nodes  that  can  be  visited  more  than  once,  contrary 

453  to  a  path,  where  the  nodes  are  visited  at  most  once.  Hence,  the  elements 

454  of  Wk  accounts  for  the  strength  of  all  possible  walks  of  length  k  between 

455  nodes  i  and  j.  Also,  the  entries  of  P  correspond  to  the  weighted  sum  of  the 

456  strength  of  all  possible  walks  of  length  one  and  higher,  between  nodes  i  and 

457  j,  providing  thus  a  measure  of  how  strong  the  communication  is  between 

458  them  (communicability,  Estrada  and  Higham  2010;  Estrada  2010).  Given 

459  that  the  number  of  walks  increases  with  length,  the  weight  k\  is  selected  to 

460  compensate  for  this  effect,  penalizing  long  walks. 

461  Now,  we  can  define  (Estrada  and  Higham,  2010;  Estrada,  2010), 

SGCi  =  [A pin,  COMij  =  Pij,  i  ±  j.  (6) 

462  Hence,  the  subgraph  centrality  of  a  node  SGCi  corresponds  to  the  connnu- 

463  nicability  of  a  node  with  itself,  while  COMij  corresponds  to  the  communica- 

464  bility  between  two  different  nodes  i  ^  j. 

465  Notice  that  the  diagonal  of  matrix  P  is  a  weighted  sum  of  all  closed  walks 

466  (information  transfer)  of  lengths  two  and  higher  around  each  node.  The 

467  information  provided  by  the  closed  walks  of  length  zero  in  the  connectivity 

468  matrix  (Aw)  is  lost,  however,  since  it  is  not  used  anywhere.  To  recover  it, 

469  we  define  here  P  =  P  +  Aw  as  the  generalized  communicability  matrix ,  since 

470  it  provides  all  possible  communications  among  all  nodes  of  length  zero  and 

471  above,  without  including  self-loops  other  than  the  one  in  the  starting  node 

472  itself. 

473  The  communicability  matrix  has  no  zero  entries,  except  along  the  diago- 

474  nal,  which  implies  4900-70  (4830)  hypothesis  tests  for  our  data  (n  =  70),  one 
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475  for  each  non-zero  entry.  Hence,  a  spectral  analysis  of  the  communicability 

476  matrix  can  be  performed,  (Estrada,  2010;  Crofts  and  Higham,  2009),  to  ob- 

477  tain  a  family  of  tests  of  order  O(n),  where  n  are  the  number  of  eigenvalues  of 

478  the  communicability  matrix.  In  particular,  the  above  defined  matrix  COM 

479  can  be  decomposed  in  terms  of  its  eigenvalues  and  eigenvectors  as 


COM  =  £  AfcVfcVfc,,  (7) 

k=  1 

480  where  A*,  are  the  eigenvalues  of  COM ,  and  vfc  its  eigenvectors,  k  —  1, . . . ,  n. 
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3.2.2.  Global  and  Node  Levels 

The  undirected  network  efficiency  (E)  and  clustering  coefficient  ( C ),  have 
been  previously  reported  as  indicative  of  sex  and  age  differences  (Gong  et  ah, 
2009).  Here,  we  use  the  directed  weighted  versions,  defined  as  (Rubinov  and 
Sporns,  2010), 


E  =  -  y^Ei,  Ei 
n  ■ ' 


n  —  1 


(8) 


l 

c  =  -Yjci)  Ci 

i 


j  Y,j,h&Nl(KWihWhjWji)l/* 
k(k  -  1)  -  2  dijdji 


(9) 


0  if  ic„-  =  0  .  ^ 

hi  =  <  i*=  £(%  +  Sji) 

^  1  if  Wij  >0  j 

where,  n  represents  the  number  of  nodes,  the  weighted  directed  shortest 
path  length  between  nodes  i  and  j,  and  Nt  the  neighborhood  of  node  i  (nodes 
connected  to  node  i  by  a  single  link).  Network  efficiency  measures  how  fast 
information  can  be  transmitted  in  the  network,  globally  (E),  and  locally  at 
each  node  (Ei).  The  clustering  coefficient  measures  how  much  nodes  in  a 
graph  tend  to  cluster  together,  globally  (C)  and  locally  at  the  node  level 
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(Ci).  Basically,  the  directed  weighted  clustering  coefficient  measures  the 
probability  that  neighbors  of  a  node  are  also  connected  between  themselves, 
hence,  forming  clusters  around  a  node. 

Additional  traditional  topological  metrics  at  the  global  and  node  levels 
are  the  weighted  directed  betweenness  centrality  ( BC ),  weighted  modularity 
( Q ),  and  motifs  (Rubinov  and  Sporns,  2010).  The  weighted  directed  node 
betweenness  centrality  is  defined  as  (Rubinov  and  Sporns,  2010), 


BC  = 


w  £  BC« 


BC,=  Y. 

h,jeNi-,i^j^h 


Phj:r  (10) 

Phj 


( n  —  l)(n  —  2) 

where,  p\^  represents  the  number  of  shortest  paths  from  nodes  h  and  j  that 
go  through  i,  and  the  total  number  of  shortest  paths  between  h  and  j. 
The  directed  weighted  node  betweenness  centrality  measures  how  important 
each  node  is  in  the  communication  between  neighboring  nodes. 

The  weighted  modularity  ( Q )  is  defined  as  (Rubinov  and  Sporns,  2010), 


Q 


lo. 


MiMji 


(11) 


where  the  network  is  assumed  to  be  fully  subdivided  into  non-overlapping 
clusters  or  modules  (M),  with  Mt  being  the  module  that  contains  node  i, 
and  =  1  if  M,  =  Mj  and  zero  otherwise.  This  is  a  global  measure 

of  the  modularity  of  the  network,  that  is,  how  tightly  nodes  are  connected 
within  a  module.  Identifying  modules  is  of  course  a  first  step  in  analyzing 
the  structure  of  the  brain  at  a  higher  scale.  This  global  topological  mea¬ 
sure  has  a  local  hierarchical  representation,  where  we  can  have  hierarchies  of 
modules  (clusters).  Modules  can  be  found  using,  for  instance,  the  Louvain 
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514  hierarchical  modularity  algorithm  (Blondel  et  al.,  2008),  a  graph  partitioning 

515  algorithm  that  tries  to  find  the  partition  maximizing  Equation  (11).  Since 

516  graph  partitioning  is  in  general  an  NP-complete  problem,  the  Louvain  algo- 

517  rithm  computes  a  local  optimum  by  greedy  optimization.  Figure  SI,  in  the 

518  supplementary  material,  is  an  example  of  hierarchical  module  graph  parti- 

519  tioning  using  the  full  data  set. 

520  Network  motifs,  (Rubinov  and  Sporns,  2010;  Onncla  et  ah,  2005),  are 

521  also  topological  metrics  that  measure  the  intensity  or  frequency  of  certain 

522  subgraph  patterns  such  as  directed  connections  forming  a  triangle,  a  square, 

523  etc.  The  intensity  of  a  weighted  motif  ( Fmotif )  is  defined  as, 

JU,/ =  fL*,  =  (  n  >%)m“‘b  <12) 

h  h’fielwi.f 

524  where  motif  indicates  a  given  motif,  h  a  node,  Lbmottf  the  set  of  nodes  forming 

525  the  motif  at  node  h,  and  \Lmotif\  the  number  of  directed  links  in  the  motif. 

526  Motifs  are  considered  the  building  blocks  of  information  processing  in  the 

527  network  and  can  be  measured  globally  ( Fmotif )  or  locally  at  the  node  level 

528  (Fmouf).  Figure  S2,  in  the  supplementary  material,  shows  the  13  possible 

529  directed  motifs  of  size  three. 

530  New  topological  metrics,  while  popular  in  studies  of  other  network  data, 

531  have  not  yet  been  used  for  anatomical  brain  networks.  We  will  also  consider 

532  the  PageRank  (PR)  (Lohmann  et  ah,  2010;  Easley  and  Klcinberg,  2010; 

533  Shepelyansky  and  Zhirov,  2010)  and  the  Rentian  scale,  (Bassett  et  ah,  2010) 

534  here.  In  essence,  the  PageRank  (critical  in  Internet  network  analysis  and 

535  search  engines  performance)  is  a  measure  of  how  important  a  node  is,  based 

536  on  the  importance  of  its  neighbors.  Hence,  this  is  a  recursive  metric  that 
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537  starts  with  all  the  nodes  having  the  same  measure  of  importance.  More 

538  formally  (Brin  and  Page,  1998), 

PR{t)  =  YJPR'tt) 

i 

PR,(t  +  1)  =  (1  -  a)  +  a  Y  J’Ji’J'i)]  -  -  (13) 

539  where  again  n  is  the  number  of  nodes,  Ni  the  neighborhood  of  node  i,  a  is 

540  a  damping  parameter  set  in  the  [0,  1]  range,  and  t  =  1,2, .. .  the  iterations 

541  until  convergence,  defined  as  \PR(t  + 1)  —  PR(t)\  <  e,  for  some  small  number 

542  e.  The  PageRank  tries  to  identify  nodes  that  are  influential  in  the  network, 

543  not  only  because  they  have  many  connections  with  other  nodes,  but  also 

544  because  those  neighboring  nodes  are  influential  themselves.  This  may  be  a 

545  better  definition  of  node  importance  than  traditional  hubs,  which  account 

546  only  for  the  number  of  connections  of  a  node  (node  degree). 

547  The  Rentian  scale11  is  a  measure  of  the  wiring  modular  complexity  of  the 

548  network  that  is  self  similar  (fractal)  at  different  scales.  This  is  a  metric  of 

549  modularity  that  differs  from  the  previous  one  ( Q )  in  that  it  is  hierarchically 

550  represented  as  modules  within  modules  at  different  network  scales.  More 

551  formally  (Bassett  et  ah,  2010), 

EC  =  kNr,  (14) 

552  where  EC  is  the  number  of  external  connections  to  a  module,  k  a  propor- 

553  tionality  constant,  N  the  number  of  nodes  in  the  module,  and  r  the  Rentian 

11The  Rentian  scale  does  not  use  actual  the  weights  or  the  direction  information. 


27 


554  exponent.  Here,  we  use  the  physical  Rentian  scale,  which  uses  the  physical 

555  coordinates  of  the  brain  cortical  regions.  In  order  to  avoid  introducing  the 

556  obvious  differences  in  the  brain  size  due  to  sex,  we  use  the  same  physical 

557  coordinates  for  all  brain  cortical  regions,  corresponding  to  a  single  brain. 

558  The  Rentian  scale  is  computed  as  the  mean  Rentian  exponent  on  Equation 

559  (14),  by  partitioning  the  network  into  halves,  quarters,  and  so  on  in  physical 

560  space,  providing  EC  and  N  values  at  different  scales.  The  constant  k  and 

561  Rentian  scale  r  are  computed  by  least  squares  minimization  of  the  linearized 

562  Equation  (14),  log(EC)  =  log (k)  +  r  log(iV)  for  all  values  of  EC  and  N 

563  obtained  from  such  partition  (Bassett  et  al.,  2010). 

564  Some  node-to-node  topological  metrics  can  lead  to  global  metrics.  For 

565  instance,  the  trace  of  Ap  is  a  global  measure  of  node  importance  called  the 

566  Estrada  index.  The  EBC  can  also  be  made  global,  by  averaging  it  over  the 

567  entire  network.  Nevertheless,  this  kind  of  large  averaging  might  destroy  local 

568  differences  at  the  edge  level  and  will  not  be  considered  here. 

569  3.3.  FDR  Error  Control 

570  3.3.1.  Single  Family  of  Hypothesis  Testing 

571  To  control  the  FDR  for  the  single  families  of  hypothesis  corresponding 

572  to  the  research  lines  “are  there  any  global  topological  differences  between 

573  real  brain  connectivity  networks  and  random  networks;”  and  “are  there  any 

574  mean  differences  between  connectivity  matrices  clue  to  sex  and  kinship?,” 

575  we  use  here  the  linear  step-up  algorithm  of  Benjamini-Hochberg  (Benjamini 

576  and  Hochberg,  1995),  hereafter  BH-FDR.  The  BH-FDR  algorithm  has  been 

577  applied  in  many  recent  multiple  hypothesis  testing  studies,  including  brain 

578  connectivity  analysis  (Gong  et  ah,  2009;  He  et  ah,  2007;  Jahanshad  et  ah, 
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2010). 

580  Other  approaches  to  control  the  FDR  in  multiple  hypothesis  testing  that 

581  are  less  conservative  than  the  BH-FDR  algorithm  have  been  proposed  in  the 

582  literature  (Storey,  2002;  Storey  et  al.,  2004;  Westfall  et  ah,  1997;  Benjamini 

583  and  Hochberg,  2000;  Benjamini  and  Yekuteli,  2001,  2005),  but  they  require 

584  either  independence  of  the  hypotheses  being  tested  or  a  known  correlation 

585  structure  (Reiner-Benaim,  2007).  The  BH-FDR  algorithm  is  still  the  most 

586  widely  used,  as  it  is  simple  and  it  controls  the  FDR  for  normally  distributed 

587  tests  with  any  correlation  structure  (Benjamini  et  ah,  2009;  Reiner-Benaim, 

588  2007).  As  we  are  working  with  mean  differences  in  a  large  number  of  connec- 

589  tivity  matrices,  we  can  assume  that  the  mean  follows  a  normal  distribution, 

590  by  the  central  limit  theorem  (Fisher,  2011).  Hence,  the  simple  BH-FDR  er- 

591  ror  control  is  quite  appropriate  here.  For  completeness,  we  provide  here  the 

592  basic  BH-FDR  algorithm  (Benjamini  and  Hochberg,  1995;  Yekutieli,  2008): 

Algorithm  1  BH-FDR 

1.  Sort  in  increasing  order  all  the  p-values  of  the  null  hypothesis:  p\  < 
P2  <  •••  <  Pl- 

2.  Let  r  =  max.i{pi  <  q/L },  define  the  threshold  pth  =  pr.  If  no  r  could 
be  found,  define  pth  =  q/L  (pure  Bonferroni). 

3.  Reject  all  null  hypothesis  with  pi  <  pth- 


593  where,  L  is  the  number  of  null  hypothesis  and  q  the  desired  family-wise 

594  confidence  level. 
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595  3.3.2.  Multiple  Families  of  Hypothesis  Testing 

596  As  explained  before,  we  have  a  tree  of  topological  metrics  at  different  lev- 

597  els  of  resolution  (Figure  1).  ffence,  we  need  to  test  each  topological  metric 

598  at  the  global,  node-to-node,  and  node  levels.  Nevertheless,  testing  the  topo- 

599  logical  metrics  at  the  node-to-node  and  node  level  consist  of  testing  families 

600  of  hypothesis  of  sizes  0(n)  and  0(n2),  respectively,  where  n  corresponds  to 

601  the  number  of  nodes  in  the  network,  ffence,  we  have  multiple  families  of 

602  hypothesis  testing  and  we  need  to  control  the  overall  FDR  on  each  of  the 

603  proposed  research  lines. 

604  The  FDR  error  control  has  been  limited  so  far  to  a  single  family  of  mul- 

605  tiple  hypothesis  testing.  The  implicit  assumption  in  many  large  studies  has 

606  been  that  there  is  no  need  to  control  the  FDR  when  multiple  families  of 

607  hypotheses  are  being  performed  on  the  same  data  set,  other  than  the  FDR 

608  control  on  each  family  of  hypotheses  (Yekutieli,  2008).  ffowever,  in  general, 

609  the  FDR  control  separately  applied  to  each  family  of  hypothesis  does  not 

610  imply  FDR  control  for  the  entire  study  (Benjamini  and  Yekutieli,  2005; 
on  Yekutieli,  2008).  If  a  separate  control  of  the  FDR  is  performed  on  each  fam- 

612  ily  of  hypotheses,  then  the  overall  FDR  error  corresponds  to  the  sum  of  FDR 

613  errors  of  each  family,  which  can  quickly  make  the  overall  p- value  of  the  study 

614  too  large  to  be  of  any  use.  As  we  compare  different  topological  metrics  at 

615  different  levels,  we  have  different  families  of  multiple  hypothesis  tests  that 

616  require  overall  control  of  the  FDR  for  each  research  line. 

617  To  control  the  overall  FDR  error,  we  proceed  in  a  hierarchical  way,  testing 

618  from  lower  to  higher  resolutions,  as  suggested  by  (Yekutieli  et  ah,  2006; 

619  Yekutieli,  2008).  This  strategy  makes  sense  since  it  avoids  testing  first  at 
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620  higher  resolutions,  where  the  number  of  hypotheses  to  be  tested  on  each 

621  family  could  go  up  to  4900  (n  =  70).  If  the  fraction  of  null  rejections  is  small, 

622  then  the  FDR  error  control  becomes  as  stringent  as  Bonferroni  correction 

623  (Yekutieli,  2008),  which  significantly  increases  the  chance  of  not  rejecting 

624  any  false  null  hypotheses  (false  negatives  or  Type  II  error). 

625  Figure  1  shows  the  tree  of  possible  hypotheses  while  testing  the  topolog- 

626  ical  differences  due  to  sex  and  kinship  at  three  levels:  global,  node  (corti- 

627  cal  regions),  and  node-to-node  (shortest  paths  and  communicability).  The 

628  dashed  lines  on  Figure  1  indicate  that  the  higher  resolution  hypotheses  are 

629  only  tested  if  the  parent  null  hypothesis  was  rejected,  as  indicated  by  (Yeku- 

630  tieli,  2008). 

631  An  specific  example  (see  Figure  1)  is  the  communicability  matrix  (COM), 

632  which  contains  0(n2)  non-zero  entries,  and  hence,  0(n2)  hypotheses  to  test. 

633  We  can  test  instead  its  eigenvectors  (Equation  (7)),  which  requires  only  0(n ) 

634  hypothesis  tests  to  determine  if  COM  might  be  significant. 

635  Let  H°  =  {Hf,  i  =  1, . . . ,  L0}  be  the  set  of  hypothesis  to  be  tested  at  the 

636  lowest  resolution  level,  and  Hk  =  {Hk,  i  =  1, . . . ,  Lk)  j  e  Hk~1}  be  the  set 

637  of  hypothesis  at  resolution  levels  k  =  1, . . . ,  K.  In  our  case,  K  =  2,  where 

638  K  =  0  corresponds  to  the  topological  metrics  at  the  global  level,  K  =  1  to  the 

639  topological  metrics  at  the  node  level,  and  K  =  2  to  the  topological  metrics  at 

640  the  node-to-node  level  (again,  see  Figure  1).  Hence,  we  have  a  hierarchy  of 

641  hypotheses,  where  the  FDR  error  is  controlled  at  each  level  simultaneously  on 

642  all  families  of  hypotheses,  using  the  BH-FDR  algorithm  (see  Section  3.3.1), 

643  imposing  as  mentioned  above  the  condition  that  higher  resolution  hypotheses 

644  are  tested  only  if  the  parent  hypothesis  has  been  rejected. 
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645  If  the  p- values  corresponding  to  the  hypotheses  being  tested  are  indepen- 

646  dently  distributed,  true  null  hypotheses  p-values  have  uniform  distributions, 

647  and  for  false  null  hypotheses,  the  conditional  marginal  distribution  of  all  the 

648  p-values  is  uniform,  or  stochastically  smaller  than  uniform  (Yekutieli,  2008). 

649  In  such  cases,  the  overall  FDR  for  the  whole  tree  of  hypotheses  is  bounded  to 

650  FDR  <  2 8q,  where  q  is  the  family- wise  confidence  level  and  5  ~  1.0  for  most 

651  cases,  but  can  be  as  large  as  5  ~  1.4  for  thousands  of  hypothesis  with  few 

652  discoveries.  Hence,  controlling  the  FDR  on  each  level  at  q  =  0.05  will  bound 

653  the  overall  FDR  at  0.1  in  most  cases  or  at  0.14,  when  thousands  of  hypothesis 

654  are  tested  and  the  number  of  discoveries  is  relatively  small  compared  to  the 

655  number  of  hypothesis  tested  (see  Yekutieli  2008). 

656  Testing  for  all  the  required  conditions  on  the  p-values  and  computing 

657  5  to  bound  the  overall  FDR  as  defined  before,  is  a  daunting  task  that  has 

658  been  tackled  in  the  past  by  modeling  and  multiple  simulations  with  synthetic 

659  data  (Yekutieli,  2008;  Reiner- Benaim  et  ah,  2007).  Instead,  we  can  use  the 

660  fact  that  the  bound  of  the  overall  FDR  is  the  sum  over  k  =  0, . . . ,  K  of  the 

661  bounds  for  the  FDR  at  each  level,  FDR(/c)  (Yekutieli  et  ah,  2006;  Yekutieli, 

662  2008).  Hence,  the  overall  tree  FDR  <  (A”  +  1  )q,  where  K  +  1  is  the  number 

663  of  levels  in  the  tree.  Here  K  =  2,  hence,  FDR  <3 q  =  0.15,  for  a  family- wise 

664  confidence  level  of  0.05  at  each  level,  which  is  quite  close  to  the  predicted 

665  (most  conservative)  theoretical  overall  bound  with  5  =  1.4. 

666  3.3.3.  Screening 

667  Despite  the  overall  control  of  the  FDR  described  before,  for  large  studies, 
ees  it  is  quite  possible  that  the  BH-FDR  control  would  become  equivalent  to  a 
669  simple  (too  conservative)  Bonferroni  correction,  and  no  single  null  hypoth- 
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670  esis  could  be  rejected  (Benjamini  and  Yekutieli,  2005).  Most  large  studies, 

671  e.g.,  the  expression  levels  of  thousands  of  genes  in  microarrays,  nowadays 

672  use  screening  methods  to  reduce  the  number  of  hypotheses  tested,  improving 

673  the  overall  statistical  power  of  the  FDR  control,  especially  when  the  fraction 

674  of  rejections  of  the  null  hypothesis  is  small  (Benjamini  and  Yekutieli,  2005). 

675  Screening  to  eliminate  some  uninteresting  hypotheses  is  valid,  so  long  as  the 

676  null  hypothesis  of  the  screening  method  is  independent  of  the  null  hypothe- 

677  sis  being  tested  (Yekutieli,  2008).  Since  the  null  hypothesis  in  most  tests  is 

678  that  mean  differences  are  zero,  a  valid  screening  method  is  an  ANOVA  sin- 

679  gle  effects  F- ratio  screening  (Reiner-Benaim  et  al.,  2007),  in  which  the  null 

680  hypothesis  depends  on  the  variance  of  the  data  (see  details  in  Appendix). 

681  In  addition  to  reducing  the  number  of  hypotheses  to  be  tested,  it  has  been 

682  also  proposed  to  use  thresholds  on  the  connectivity  matrices  themselves  to 

683  get  rid  of  noisy  connections,  avoiding  thus  unnecessary  tests  on  those  connec- 

684  tions.  To  avoid  ad-hoc  thresholds,  we  screen  the  connectivity  matrix  using 
ess  a  set  of  increasing  thresholds  that  produce  different  connectivity  matrices  at 
ese  different  sparsity  levels  (Rubinov  and  Sporns,  2010;  Bullmore  and  Bassett, 
687  2010;  Achard  and  Bullmore,  2007;  Bassett  et  ah,  2008).  This  data  screening 
ess  technique  reveals  statistical  differences  at  different  levels  of  sparsity  that  are 
ess  not  seen  with  a  single  ad-hoc  threshold  (Gong  et  ah,  2009).  Optionally,  a 

690  single  robust  threshold  can  be  used  on  the  connectivity  matrices  themselves, 

691  using  the  BH-FDR  error  control  (Abramovich  and  Benjamini,  1996).  Here, 

692  we  screen  the  normalized  connectivity  matrices  with  thresholds  in  the  [0,  0.05] 
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693  range,12  as  in  (Gong  et  al.,  2009),  given  that  the  BH-FDR  based  threshold  is 

694  too  stringent  and  may  miss  important  discoveries.  Figure  S3  illustrates  how 

695  these  thresholds  affect  the  sparsity  of  the  thresholded  matrices. 

696  Here,  we  use  then  the  simple  screening  method  of  thresholding  the  connec- 

697  tivity  matrices  at  different  sparsity  levels  proposed  by  (Rubinov  and  Sporns, 

698  2010;  Bullmore  and  Bassett,  2010;  Achard  and  Bullmore,  2007;  Bassett  et  ah, 

699  2008),  given  its  simplicity  and  independence  of  the  hypothesis  being  tested. 

700  Then,  we  apply  an  ANOVA  single  effects  F-ratio  screening  test  to  eliminate 

701  remaining  uninteresting  hypotheses  (see  Appendix  for  details).  This  kind  of 

702  selective  inference  has  not  yet  received  proper  theoretical  or  practical  con- 

703  sideration  in  the  context  of  screening  uninteresting  hypotheses  and  the  less 

704  obvious  connection  between  the  screening  test  and  the  follow-up  one  (Reiner- 

705  Benaim,  2007;  Benjamini  et  ah,  2009).  Better  FDR  error  control  algorithms 

706  are  needed,  especially  for  cases  where  the  number  of  null  hypotheses  is  large 

707  and  the  FDR  methods  reduce  to  a  simple  Bonferroni  correction. 

708  3.3.4 ■  Bootstrapping 

709  We  need  to  describe  how  are  we  going  to  compute  the  p-values  that  the 

710  BH-FDR  error  control  requires.  As  we  are  working  with  average  connec- 

711  tivity  and  topological  network  differences  between  different  groups  of  indi- 

712  viduals  (including  pairs  of  individuals),  then  by  the  central  limit  theorem, 

713  those  averages  should  asymptotically  follow  a  Gaussian  distribution  (Fisher, 

714  2011).  Nevertheless,  there  could  be  some  small  variations  from  the  Gaussian 

715  distribution  on  real  finite  samples,  so  we  use  a  non-parametric  approach. 

12  Recall  that  the  normalized  connectivity  matrices  are  all  in  the  [0,  1]  range. 
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716  Bootstrapping  can  improve  the  reliability  of  inference  compared  with  con- 

717  ventional  asymptotic  tests  (Davison  and  MacKinnon,  1999).  We  use  boot- 

718  strapping  with  replacement  to  obtain  20,000  samples  of  the  mean  for  each 

719  metric,  scale,  and  class.  The  p-values  (p)  required  by  the  BH-FDR  error 

720  control  can  be  easily  computed  from  the  bootstrapped  distribution  of  the 

721  mean  differences, 

B  B 

P  =  —min{y^J(si)  s.t.  Si  >  0,  ^  /(s*)  s.t.  <  0)},  (15) 

i= 1  i=  1 

722  where  B  is  the  number  of  bootstrapped  samples,  c  =  1  for  single-tailed  tests, 

723  c  =  2  for  double-tailed  tests,  s*  are  the  bootstrapped  sample  differences,  and 

724  I(si )  the  frequency  of  those  samples.  Sample  differences  are  for  instance 

725  differences  in  the  clustering  coefficient  at  a  given  brain  region  (node)  i,  or 

726  differences  in  the  communicability  matrix  taken  as  a  column  vector  at  the 

727  entry  i,  due  to  sex.  As  in  (Gong  et  al.,  2009),  we  consider  positive  and 

728  negative  differences  in  the  connectivity  matrices  and  topological  metrics  of 

729  the  associated  digraphs  for  both  sex  and  kinship  differences,  so  we  will  use 

730  one-tailed  p-values. 

731  3.3.5.  Z-scores  Global  Topological  Metrics 

732  As  the  global  topological  metrics  of  the  brain  connectivity  networks  and 

733  their  corresponding  random  networks  are  independent,  the  Z-score  of  their 

734  differences  is 


M  -  Mr 


(16) 
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735  where  M  indicates  the  mean  of  metric  M  and  M r  the  mean  metric  for  the 

736  corresponding  random  network.  Here  we  use  a  parametric  t-test,  as  there 

737  are  enough  samples  of  the  population  to  assume  Gaussianity,  and  being  con- 

738  sistent  with  previous  results  comparing  real  and  random  networks  (Rubinov 

739  and  Sporns,  2010;  Boccaletti  et  al.,  2006). 

740  4.  Results 

741  We  show  here  the  results  obtained  from  the  303  HARDI-derived  connec- 

742  tivity  matrices,  with  a  formal  statistical  analysis  of  the  topological  features 

743  as  described  before.  For  space  considerations,  the  detailed  lists  of  features  is 

744  presented  in  the  supplement,  with  corresponding  p-values  and  mean  differ- 

745  ences. 

746  The  figures  in  the  next  sections  showing  the  features  selected  by  the 

747  machine  learning  methods  described  in  Section  3.1  are  color  coded  according 

748  to  the  score  provided  by  the  feature  selection  algorithm.  This  score  accounts 

749  for  the  effects  of  each  feature  on  the  classification  accuracy  and  its  stability 

750  across  the  n-fold  cross-validation  runs  (see  more  details  on  the  tools  employed 

751  in  the  Appendix).  We  do  not  indicate  here  which  are  the  top  ranked  features, 

752  since  all  the  features  selected  are  important  for  classification  purposes,  even 

753  if  they  ranked  the  lowest.  For  instance,  if  we  only  take  the  10  top  ranked 

754  features  and  use  them  for  classification,  the  performance  would  be  relatively 

755  poor. 

756  Figures  in  the  next  sections  showing  the  statistically  significant  features 

757  found  in  hypothesis  testing  (Section  3.3)  are  color  coded  according  to  their 

758  Z-score  and  the  sign  of  the  difference,  magenta  for  positive  and  cyan  for 
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759  negative.  As  the  sign  of  the  difference  depends  on  the  order  of  the  operands, 

760  we  specify  in  the  corresponding  text  and  on  each  figure  what  is  the  meaning 

761  of  each  color.13 

762  4-1.  Classification 

763  Tables  S2-S4  compare  the  classification  results  for  the  three  node-to-node 

764  level  metrics  considered  here,  the  “raw”  connectivity  matrices,  generalized 

765  communicability  matrix  (P),  and  edge  betweenness  ( EBC ),  using  the  three 

766  normalizations  indicated  in  Section  2.  The  performance  of  sex  classihca- 

767  tion  for  the  connectivity  matrices,  generalized  communicability,  and  edge 

768  betweenness,  using  Equation  (3),  are  93%,  92.2%,  and  92.5%,  respectively. 

769  The  corresponding  performances  for  Equation  (1)  are  88.1%,  88.1%,  and 

770  93.7%,  respectively,  and  for  Equation  (2)  are  89.9%,  88.3%,  and  80.7%,  re- 

771  spectively.  The  performance  of  kinship  classification  for  the  connectivity  ma- 

772  trices,  generalized  communicability,  and  edge  betweenness,  using  Equation 

773  (3),  are  88.5%,  88.5%,  and  87.3%,  respectively.  The  corresponding  perfor- 

774  mances  for  Equation  (1)  are  89.7%,  85.8%,  and  75.2%,  respectively,  and  for 

775  Equation  (2)  are  87.4%,  83.6%,  and  75.5%,  respectively. 

776  Notice,  that  in  some  cases,  Equation  (1)  produces  slightly  better  classi- 

777  hcation  results  than  Equation  (3),  however,  as  indicated  in  the  Appendix, 

778  only  Equations  (2)- (3)  reduce  significantly  the  confounding  effects  of  brain 

13Recall  that  for  the  kinship  classes,  we  will  be  comparing  connectivity  matrices  that 
represent  the  absolute  connectivity  differences  within  each  group,  and  not  the  connectivity 
of  each  individual  or  pairs  of  individuals.  Hence,  differences  between  two  kinship  classes 
refer  here  to  differences  between  the  two  means  of  the  within-group  differences. 
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779  size.  In  addition,  Equation(3)  produces  the  best  overall  classification  results, 

780  considering  all  the  classes  and  topological  metrics. 

781  Classification  performance  was  just  slightly  better  than  chance  for  all 

782  topological  metrics  at  the  node  level  (Figure  1),  and  hence,  they  were  not 

783  compared  here  using  Equations  (l)-(3).  Next  sections  show  in  more  detail 

784  the  classification  results  using  Equation  (3). 

785  4..  1.1.  Connectivity  Matrices 

786  We  start  with  the  classification  results  when  the  “raw”  connectivity  rna- 

787  trices  are  used,  one  per  individual  and  one  per  pairs  of  individuals.  Table  1 

788  and  Table  S5  (for  the  confusion  matrix,  provided  in  the  supplementary  mate- 

789  rial)  compare  sex  classification  performance  using  all  features  (probabilities 

790  of  connection  between  the  n  —  70  cortical  regions)  of  the  connectivity  ma- 

791  trix  against  feature  selection.  Feature  selection  greatly  improves  classification 

792  performance  -  the  selected  features  provide  more  information  to  distinguish 

793  between  sexes.  Overall,  classification  accuracy  improved  from  49.5%  using  up 

794  to  2763  features  of  the  connectivity  matrices,  to  93%  after  feature  selection 

795  that  reduced  the  number  of  features  to  297.  According  to  our  permutation 

796  tests,  the  probability  of  achieving  this  classification  performance  by  chance 

797  is  0.001  or  lower.  Figure  2a.  shows  the  features  that  provide  the  best  clas- 

798  sification  results  for  sex,  in  the  raw  connectivity  matrix.  Table  S7  in  the 

799  supplement  lists  the  selected  features  in  more  detail. 

800  The  feature  selection  algorithm  selected  70  inter-hemispheric  features  as 
sol  influential  for  sex  classification  purposes  and  about  the  same  number  of  fea- 

802  tures  on  the  left  (113)  and  right  (114)  hemispheres  (Figure  2a.). 

803  Table  2  and  Table  S6  (for  the  confusion  matrix,  in  the  supplementary 
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804  material)  compare  kinship  classification  performance  using  all  features  of  the 

805  connectivity  matrix  versus  feature  selection.  Here,  the  overall  classification 

806  accuracy  improved  from  63.5%  using  np  to  2763  features  of  the  connectivity 

807  matrix  to  88.5%  using  the  250  features,  automatically  selected  by  feature 

808  selection.  Permutation  tests  indicate  that  the  probability  of  arriving  to  this 
80s  classification  performance  by  chance  is  equal  or  below  to  0.001.  Figure  2b. 
sio  shows  the  features  that  provide  the  best  classification  results  for  kinship,  in 

811  the  connectivity  matrix.  Table  S8  in  the  supplementary  material  list  the 

812  corresponding  selected  features  in  more  detail. 

an  The  feature  selection  algorithm  selected  59  inter-hemispheric  features  as 
8i4  influential  for  kinship  classification  purposes  and  about  the  same  number  of 
sis  features  selected  on  the  left  (97)  and  right  (94)  hemispheres  (Figure  2b.). 

816  4-.  1.2.  Topological  Metrics 

817  The  best  results  at  the  node  level  correspond  to  the  clustering  coefficient 
sis  and  for  sex  classification,  as  indicated  in  Table  3.  Overall  classification  ac- 

819  curacy  improved  from  55.4%  using  the  clustering  coefficient  on  all  70  nodes 

820  to  62.7%  using  the  53  (not  a  significant  reduction)  nodes  selected  using  au- 

821  tomatic  feature  selection. 

822  On  the  other  hand,  good  classification  results  were  obtained  for  sex  and 

823  kinship  using  the  node-to-node  topological  metrics:  edge  betweenness  cen- 

824  trality  ( EBC )  and  the  generalized  communicability  matrix  (P),  respectively. 

825  The  results  from  the  generalized  communicability  matrix  are  slightly  better 

826  than  those  using  EBC  for  sex,  while  those  from  EBC  are  slightly  better  for 

827  kinship.  Hence,  we  present  here  the  best  classification  performances. 

828  Tables  4  and  Table  S9  in  the  supplement  (confusion  matrices)  show  the 
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829  sex  classification  performance  using  the  generalized  communicability  matrix. 

830  For  comparison  purposes,  we  also  compute  the  classification  performance  us- 

831  ing  FDR  (Abramovich  and  Benjamini,  1996)  to  select  the  most  statistically 

832  significant  elements  of  the  generalized  communicability  matrix  at  the  q=0.05 

833  level.  Sex  classification  accuracy  improved  from  51.8%  using  all  4900  fea- 

834  tures  of  the  generalized  communicability  matrix  to  92. 2%14  using  the  301 

835  features  automatically  selected  by  feature  selection.  The  overall  accuracy  of 

836  sex  classification  degraded  to  46.2%  using  the  935  features  selected  by  FDR 

837  thresholding. 

838  Tables  5  and  Table  S10  in  the  supplement  show  the  kinship  classification 

839  performance  using  edge  betweenness  centrality,  where  as  before,  we  included 

840  the  classification  performance  using  FDR  for  feature  selection.  The  overall 

841  kinship  classification  accuracy  improved  from  57.1%  using  2388  features  of 

842  P  to  87.3%  using  the  251  features  selected  by  feature  selection.  The  overall 

843  accuracy  of  kinship  classification  degraded  to  32.1%  using  the  1031  features 

844  selected  by  FDR  thresholding. 

845  Figure  3. a  shows  the  301  features  (entries)  of  the  generalized  communi- 

846  cability  matrix  that  provide  the  best  classification  results  for  sex  (listed  in 

847  more  detail  on  Table  Sll),  while  Figure  3.b  shows  the  251  features  (edges)  of 

848  the  EBC  metric  that  provide  the  best  classification  results  for  kinship  (listed 

849  in  more  detail  on  Table  S12).  The  301  best  entries  of  the  communicability 

850  matrix  for  sex  classification  represent  weighted  walks  of  different  lengths  (or 

14Notice  in  tables  S3-S4  that  EBC  has  a  slightly  higher  classification  than  communica¬ 
bility,  but  it  has  a  higher  BER  error,  hence  we  choose  here  the  generalized  communicability 
matrix. 
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851  subgraphs,  see  Section  3.2.1)  centered  on  the  connections  indicated  on  Figure 

852  3a. 

853  The  total  number  of  automatically  selected  entries  of  the  communicability 

854  matrix  were  distributed  as  99  centered  on  inter-hemispheric  connections,  116 

855  centered  on  the  left  hemisphere,  and  86  on  the  right  hemisphere.  On  the  other 

856  hand,  the  251  entries  of  the  EBC  for  zygosity  classification  represent  (see 

857  Section  3.2.1)  the  importance  of  each  connection  in  the  connectivity  matrix 

858  in  terms  of  shortest  paths  using  such  connections.  In  particular,  the  selected 

859  entries  of  the  EBC  were  distributed  as  (Figure  3b)  51  inter- hemispheric,  94 

860  in  the  left  hemisphere,  and  107  in  the  right  hemisphere. 

861  Even  though  classification  with  cross-validation  does  not  require  Bonfer- 

862  roni  correction,  the  p-values  of  the  permutation  tests  do  require  correction, 

863  as  each  permutation  test  corresponds  to  testing  the  null  hypothesis  that  the 

864  reported  classification  performance  was  obtained  by  chance  (Ojala  and  Gar- 

865  riga,  2010).  In  these  two  lines  of  research  (sex  and  kinship),  we  performed 
see  permutation  tests  for  the  11  proposed  topological  metrics  (not  all  shown  here) 
867  indicated  on  Figure  1  at  the  node  and  node-to-node  levels,  plus  the  permuta- 
ses  tion  tests  performed  to  compare  equations  (l)-(3)  and  those  to  compare  the 
sea  generalized  communicability  matrix  with  the  communicability  matrix  (also 

870  not  shown  for  space  reduction).  Hence,  we  did  in  total  13  permutation  tests 

871  for  sex  and  13  for  kinship.  The  BH-FDR  correction  keeps  the  overall  false 

872  discovery  rate  for  the  permutation  tests  to  0.001,  since  all  tests  rejected  the 

873  null  hypothesis  at  this  confidence  level. 
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874  J^.2.  Hypothesis  Testing 

875  4-.2.1.  Connectivity  Matrices 

876  We  now  present  the  results  of  hypothesis  testing  on  differences  in  the 

877  connectivity  matrix  due  to  sex  and  kinship.  Prior  work  on  connectivity  ma- 

878  trices  for  differentiating  sex  and  kinship  classes  have  focused  on  just  a  few 

879  connections  (10)  (Jahanshad  et  ah,  2011).  Previous  work  also  did  not  con- 

880  sider  all  possible  pair-wise  comparisons  between  identical  twins,  non-identical 

881  multiples,  non-twin  siblings,  and  unrelated  subjects. 

882  Sex  Differences.  Figure  4  shows  the  36  statistically  significant  sex  differences 

883  found  in  the  connectivity  matrices  after  BH-FDR  error  control,  requiring  a 

884  Z-score  1.75  or  higher  (p-value  of  0.0405  or  lower,  for  a  single  tailed  normal 
ass  distribution).  The  color  map  indicates  where  the  probability  of  connection 
sse  is  higher  for  women  (magenta)  than  for  men  (cyan).  As  seen  in  this  figure, 

887  on  average,  women  have  higher  brain  connectivity  than  men  in  both  herni- 

888  spheres,  on  the  directed  connection  pairs  shown.  Figure  4  also  shows  that 
ass  women  have  higher  inter-hemispheric  connectivity  than  men,  in  agreement 

890  with  (Jahanshad  et  ah,  2011).  Nevertheless,  men  have  some  higher  probabil- 

891  ities  of  connection  than  women,  mainly  on  the  right  hemisphere  (Figure  4). 

892  Table  S13  in  the  supplement  shows  in  more  detail  each  pair  of  connection 

893  statistics  (36)  with  their  means  and  p-values.  The  first  five  largest  rcla- 

894  tive  differences  with  the  lowest  p-values  were  in  the  following  connections: 

895  Pars  Opercularis  -  Post  Central  and  Frontal  Pole  -  Caudal  Anterior  Cingu- 

896  late,  in  the  left  hemisphere,  Inferior  Parietal  -  Corpus  Callosum,  in  the  right 

897  hemisphere,  and  the  inter-hemispheric  connections  Cuneus  (right)  -  Lateral 

898  Occipital  (left)  and  Inferior  Parietal  (left)  -  Corpus  Callosum  (right). 
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899  Kinship  Differences.  Figure  5  shows  the  statistically  significant  differences 

900  between  a)  identical  twins  and  non-identical  multiples,  b)  identical  twins 

901  and  non-twin  siblings,  c)  identical  twins  and  unrelated  pairs  of  individuals, 

902  d)  non-identical  multiples  and  non-twin  siblings,  e)  non-identical  multiples 

903  and  unrelated  pairs  of  individuals,  and  f)  non- twin  siblings  and  unrelated 

904  pairs  of  individuals;  covering  thus  all  possible  pair-wise  comparisons  between 

905  these  four  groups.  The  reported  differences  have  a  Z-score  of  2.67  or  higher  as 
goo  required  by  the  FDR  error  control  overall  possible  pair-wise  comparisons.  As 
907  may  be  expected  for  a  genetically  influenced  trait  (Thompson  et  ah,  2001), 
90s  greater  differences  are  found  between  unrelated  pairs  of  individuals  and  sib- 

909  lings  than  between  non-twin  siblings  and  twins.  Also,  greater  differences 

910  are  found  between  siblings  and  twins  than  between  identical  twins  and  non- 

911  identical  multiples.  The  color  map  indicates  where  the  differences  are  higher 

912  for  the  first  group  (magenta)  or  for  the  second  (cyan). 

913  Of  special  interest  are  the  connections  that  show  the  highest  Z-score  differ- 

914  ences  between  identical  twins  and  non-identical  twins  (Figure  5):  Lateral  Or- 

915  bitofrontal  -  Middle  Temporal,  Rostral  middle  frontal  -  Supra-marginal,  and 

916  Supra-marginal  -  Rostral  middle  frontal,  in  the  left  hemisphere,  and  the  inter- 

917  hemispheric  connection  Corpus  callosum  (left)  -  Medial  Orbitofrontal  (right). 

918  Most  of  the  differentiating  connections  between  identical  twins  and  non- 

919  identical  twins  are  either  in  the  left  hemisphere  or  in  the  inter-hemispheric 

920  connections.  A  similar  behavior  can  be  observed  on  the  differences  between 

921  identical  twins  and  non-twin  siblings. 
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922  4-2.2.  Topological  Metrics 

923  We  now  concentrate  on  the  topological  metrics  and  study  their  strength  in 

924  distinguishing  between  the  different  groups  and  between  real  brain  networks 

925  and  random  ones. 

926  Random  Networks.  We  first  report  differences  between  real  brain  connectiv- 

927  ity  networks  and  random  networks,  obtained  by  rewiring,  at  random,  the 

928  original  brain  connectivity  networks  while  preserving  the  in  and  out  node 

929  degrees  (recall  that  following  the  normalization,  the  obtained  networks  are 

930  directed).  Table  6  shows  the  mean  and  standard  deviation  (within  paren- 

931  thesis)  of  the  topological  metrics  tested,  and  the  Z-score  for  the  difference 

932  between  the  real  networks  and  the  corresponding  random  networks  for  each 

933  topological  metric. 

934  The  exponent  7  of  the  scale-free,  node  degree  truncated  power  law  distri- 

935  bution,  (Bullmore  and  Bassett,  2010;  Boccaletti  et  ah,  2006),  is  also  shown. 

936  From  the  13  possible  directed  motifs  of  size  three  mentioned  before  (Fig- 

937  ure  S2),  only  motifs  9  and  13  are  present  in  the  brain  connectivity  matrices 

938  analyzed  here,  and  therefore  only  the  intensity  (Section  3.2.2)  of  these  two 

939  motifs  are  compared  in  the  table. 

940  The  FDR  multiple  hypothesis  testing  error  control  rejects  all  null  hypoth- 

941  esis  with  a  Z-score  equal  or  above  2.12,  at  a  family-wise  error  control  level  of 

942  0.05.  Hence,  the  global  clustering  coefficient,  modularity,  and  motifs  9  and 

943  13,  can  be  used  to  differentiate  real  brain  connectivity  networks  from  their 

944  corresponding  random  network. 

945  As  the  nodes’  degree  in  the  brain  connectivity  networks  follows  a  trun- 

946  cated  power  law,  we  can  say  that  these  networks  are  scale-free. 
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947  Since  the  characteristic  path  of  these  networks  is  as  efficient  as  that  of  the 

948  corresponding  random  networks,  while  the  clustering  coefficient  and  modu- 

949  larity  are  higher,  we  can  infer  that  brain  networks  satisfy  the  small-world 

950  property,  i.e.,  they  combine  high  modularity  with  a  robust  number  of  inter- 

951  modular  short  paths  (Rubinov  and  Sporns,  2010;  Boccaletti  et  ah,  2006). 

952  We  have  then  demonstrated  small-worldness  of  anatomical  brain  connec- 

953  tivity  networks  using  a  relatively  large  number  of  samples,  and  found  that, 

954  according  to  other  topological  metrics,  the  networks  are  non-random. 

955  Sex  Differences.  Following  the  hierarchical  scheme  of  Section  3.3.2  (see  also 

956  Figure  1),  we  threshold  the  connectivity  matrices  at  different  screening  val- 

957  ues  and  compute  the  one-tailed  p-values  obtained  from  the  bootstrapped 

958  distributions  of  the  mean  (Equation  (15)),  for  each  one  of  the  9  topological 

959  metrics  considered.  Figure  S4  details  these  results  in  terms  of  the  Z-score  for 
geo  each  topological  metric,  when  the  connectivity  matrices  are  thresholded  in 

961  the  [0,  0.05]  range,  as  well  as  the  BH-FDR  threshold.  The  BH-FDR  method 

962  requires  a  minimum  Z-score  of  2.5,  from  which  we  conclude  that  only  the 

963  clustering  coefficient  satisfies  the  FDR  error  control  at  the  node  level.  In 

964  addition,  the  eigenvalues  of  the  communicability  matrix  may  be  tested  for 

965  statistical  significance  at  this  level  (Figure  1),  to  check  if  the  communicability 

966  matrix  should  be  tested  at  the  node-to-node  level. 

967  Figure  6a  shows  the  Z-score  for  the  differences  in  the  clustering  coeffi- 

968  cient,  due  to  sex,  on  each  node;  while  Figure  6b  shows  the  Z-score  for  the 

969  eigenvalue  differences  of  the  communicability  matrix,  also  due  to  sex.  Higher 

970  clustering  coefficients  for  women  are  shown  in  magenta,  while  higher  cluster- 

971  ing  coefficients  in  men  are  indicated  in  cyan.  Figures  6a  and  6b  also  indicate, 
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972  in  black  dashed  lines,  the  minimum  Z-score  (2.13)  required  by  the  BH-FDR 

973  error  control  on  both  families  of  tests,  at  q=0.05.  Table  S14  in  the  supplc- 

974  rnent  details  the  sex  differences  in  the  clustering  coefficient.  In  this  figure, 

975  most  differences  are  in  the  left  hemisphere,  which  agrees  with  previous  re- 

976  suits  indicating  women  have  a  higher  brain  connectivity  than  men  in  the  left 

977  hemisphere  (Jahanshad  et  ah,  2011;  Gong  et  al.,  2009).  Here,  we  obtained 

978  similar  results  with  a  relatively  larger  number  of  HARDI  images  and  using 

979  all  the  brain  regions  indicated  in  Table  SI. 

980  We  found  that  the  following  cortical  regions  in  the  left  hemisphere  have 

981  a  larger  clustering  coefficient  in  women  than  in  men:  Caudal  Anterior  Cin- 

982  gulate,  Pars  Orbitalis,  Rostral  Anterior  Cingulate,  Rostral  Middle  Frontal. 

983  In  the  right  hemisphere,  we  found  that  the  Cuneus  and  Middle  Temporal 

984  cortical  regions  have  also  a  larger  clustering  coefficient  in  women  than  in 

985  men. 

986  Figure  6b  indicates  that  in  the  spectral  decomposition  of  the  communi- 

987  cability  matrix  (Section  3.2.1),  one  eigenvalue  was  found  to  be  statistically 

988  significant  for  the  differences  between  women  (magenta)  and  men  (cyan),  so 

989  there  are  sex  differences  in  the  communicability  matrix  at  the  node-to-node 

990  level. 

991  Figures  7a  and  7b  show  the  Z-score  for  the  statistically  significant  sex 

992  differences  in  the  edge  betweenness  centrality  ( EBC )  and  the  communica- 

993  bility  matrix,  respectively,  due  to  sex.  For  simplicity,  the  figures  only  show 

994  the  Z-scores  for  the  sex  differences  exceeding  the  minimum  Z-score  (3.29) 

995  required  by  the  BH-FDR  error  control  over  both  families  of  hypothesis  tests 

996  at  the  0.05  level.  In  both  figures,  higher  EBC  or  communicability  values 


46 


997  for  women  are  indicated  in  magenta,  while  higher  EBC  or  communicability 

998  values  for  men  are  indicated  in  cyan. 

999  As  seen  in  Figure  7a,  only  five  entries  in  the  EBC  matrix  are  statistically 

1000  significant  at  this  confidence  level,  and  are  indicated  in  more  detail  in  Table 

1001  S15  (supplementary  material).  In  particular,  the  EBC  metric  is  higher  in 

1002  women  than  in  men  for  the  following  connections  in  the  left  hemisphere:  Non- 

1003  cortical  -  Lingual  and  Lingual  -  Parahippocampal.  In  the  right  hemisphere, 

1004  we  found  that  the  EBC  metric  is  higher  in  women  than  in  men  for  the 

1005  Precuneus  -  Corpus  Callosum  connection.  Finally,  the  EBC  metric  on  the 
iooo  inter-hemispheric  connection  Supra-marginal  (left)  -  Peri-calcarine  (right)  is 
1007  also  higher  in  women  than  in  men.  The  p-values  are  around  10-4,  indicating 
loos  a  very  high  confidence  level. 

loos  Figure  7b  shows  that  12  differences  in  the  directed  communicability  ma- 
ioio  trix  are  statistically  significant.  These  differences  are  explained  in  more  detail 
ion  in  Table  S16  (supplementary  material).  In  general,  women  have  higher  di- 

1012  rected  communicability  values,  in  the  inter-hemispheric  region,  than  men. 

1013  These  communicability  values  are  very  small  (3  x  10~8  to  7  x  10-4);  this  is 

1014  because  only  long  walks  are  present  between  the  indicated  nodes,  and  the 

1015  contribution  of  those  walks  to  the  communicability  matrix  are  significantly 

1016  reduced  by  the  factorial  of  the  walk  length  on  Equation  (15).  For  subsequent 

1017  studies  that  focus  on  the  communicability  matrix,  we  recommend  zooming 

1018  in  on  longer  walks,  as  suggested  in  (Estrada,  2010). 

1019  Most  of  the  statistically  significant  differences  found  between  women  and 

1020  men  in  the  communicability  matrix  are  in  the  inter- hemispheric  region  and 

1021  the  p-values  of  these  differences  are  of  the  order  of  10~4.  In  particular,  the 
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1022  highest  differences  found  were  Middle  Temporal  (left)  -  Medial  Orbitofrontal 

1023  (right),  Frontal  pole  (right)  -  Parahippocampal  (left),  Superior  Temporal 

1024  (left)  -  Medial  Orbitofrontal  (right),  Transverse  temporal  (right)  -  Parahip- 

1025  pocampal  (left),  and  Lingual  (right)  -  Parahippocampal  (left). 

1026  Finally,  the  overall  FDR  for  this  line  of  research  is  FDR  <3 q  —  0.15  (see 

1027  Section  3.2). 

1028  Kinship  Differences.  As  in  the  previous  section,  we  thresholded  the  con- 

1029  nectivity  matrices  at  different  screening  values  and  compute  the  one-tailed 

1030  p-values  obtained  from  the  bootstrapped  distributions  of  the  mean  (Equa- 

1031  tion  (15)),  for  each  one  of  the  9  topological  metrics  considered  and  for  all 

1032  pair-wise  comparisons  of  kinship  groups.  The  BH-FDR  method  requires  a 

1033  minimum  Z-score  in  the  2. 8-3.0  range,  depending  on  the  threshold  used  (Fig- 

1034  ure  S5  shows  these  results  in  greater  detail).  None  of  the  global  topological 

1035  metrics  was  statistically  significant,  when  controlling  the  false  discoveries  at 

1036  the  0.05  or  even  at  the  0.1  level.  This  is  likely  because  there  are  9  x  6  =  54 

1037  hypothesis  tests  for  all  possible  pair-wise  comparisons  of  kinship.  ANOVA 

1038  single  factor  F-ratio  reduces  this  number  to  34  on  average,  but  still  there 

1039  are  too  many  comparisons  and  most  global  metrics  have  very  low  Z-scores 
mo  (high  p-values).  One  possibility  for  future  analysis  would  be  to  consider  each 
mi  case  independently,  providing  different  metrics  for  each  pair-wise  compari- 

1042  son.  However,  we  decided  to  follow  the  hierarchical  screening  process  (see 

1043  Figure  1),  and  test  only  the  communicability  matrix  eigenvalues  at  the  node 

1044  level. 

1045  Figure  8  shows  the  communicability  eigenvalues  for  all  possible  pair-wise 
me  comparisons.  The  communicability  eigenvalues  do  not  provide  differentiation 
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1047  between  identical  twins  and  unrelated  pairs  of  individuals  at  the  minimum 
ms  Z-score  (2.12)  required  by  the  BH-FDR  error  control.  This  indicates  that 
mo  the  communicability  matrix  might  not  be  able  to  distinguish  kinship  rela- 

1050  tionships  at  the  node-to-node  level.  The  fact  that  the  eigenvalues  of  the 

1051  communicability  matrix  could  not  distinguish  all  kinship  pair-wise  compar- 

1052  isons  does  not  necessarily  imply  that  we  cannot  find  differences  using  the 

1053  communicability  matrix.  However,  as  explained  in  Section  3.3.2,  we  follow 

1054  a  conservative  approach,  and  do  not  test  the  communicability  matrix  at  the 

1055  highest  resolution.  A  complementary  study  focusing  just  on  the  communica- 

1056  bility  matrix  could  test  it  directly  to  see  if  it  provides  statistically  significant 

1057  differences  in  kinship. 

loss  Figure  9  shows  the  statistically  significant  edge  betweenness  centrality 

1059  ( EBC )  differences  for  all  pair-wise  kinship  comparisons.  The  EBC  matrix 

1060  does  provide  significant  differences  for  kinship  identification  at  the  required 

1061  BH-FDR  error  control  (Z-score  above  2.87).  In  particular,  the  connections 

1062  that  show  the  highest  Z-score  differences  between  identical  twins  and  non- 

1063  identical  twins  were  (Figure  9):  Superior  Frontal  (right)  -  Caudal  Anterior 

1064  Cingulate  (left),  Middle  temporal  (right)  -  Parahippocampal  (right),  Pre- 

1065  cuneus  (left)  -  Precuneus  (right),  Corpus  Callosum  (right)  -  Rostral  Middle 
wee  Frontal  (right),  and  Parahippocampal  (left)  -  Middle  temporal  (left). 

1067  The  overall  FDR  for  this  line  of  research  is  FDR  <3 q  =  0.15  (see  Section 

1068  3.2). 
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1069  5.  Discussion 

wo  5.1.  Normalization 

1071  On  section  2.2,  we  chose  a  normalization  (Equation  (3))  that  aims  to 

1072  reduce  cortical  volume  differences  (caused  by  brain  size  differences  for  in- 

1073  stance).  It  would  be  very  interesting  to  study  how  this  normalization  affects 

1074  the  results  if  there  are  global  differences  in  brain  size  between  groups.  In  a 

1075  degenerative  disease  such  as  Alzheimers  disease,  for  example,  there  is  interest 
we  in  whether  network  measures  of  brain  connectivity  are  altered  by  the  disease. 
1077  If  they  are,  it  is  incumbent  on  those  analyzing  the  data  to  find  out  of  the 
wo  network  differences  are  reducible  to  a  simpler  effect,  such  as  the  absolute 

1079  or  relative  size  of  a  cortical  region  becoming  smaller.  In  Alzheimers  disease 

1080  and  mild  cognitive  impairment,  for  example,  we  know  there  is  disproportion- 

1081  ate  atrophy  in  the  temporal,  entorhinal,  and  cingulate  cortices  (Thompson 

1082  et  ah,  2003;  Apostolova  and  Thompson,  2008),  and  so  any  changes  in  the 

1083  counts  and  density  of  fibers  innervating  those  areas  should  be  tested  to  see 

1084  if  the  changes  are  due  to  volume  differences  in  the  cortical  projection  areas, 
loss  If  the  proportion  of  fibers  connecting  a  given  cortical  region  to  the  other 
lose  cortical  regions  remains  the  same  in  an  atrophic  brain  relative  to  a  healthy 

1087  brain,  then  the  network  properties  of  connectivity  would  not  differ  after  such 

1088  a  normalization.  However,  if  we  do  normalize  the  connectivity  matrices  for 

1089  the  sizes  in  the  cortical  regions,  it  would  be  possible  to  infer  if  the  disease 

1090  affects  connectivity  above  and  beyond  what  would  be  expected  from  the  size 

1091  of  the  cortical  regions  alone.  Alzheimers  disease  is  thought  to  preferentially 

1092  impair  temporal  and  limbic  connectivity,  at  least  early  in  the  disease,  and  it 

1093  is  interesting  to  know  if  the  level  of  cortical  disconnection  goes  beyond  what 
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1094  would  be  seen  in  a  normal  person  with  smaller  cortical  subregions  in  these 

1095  areas.  Normalization  of  network  measures  to  cortical  ROI  size  can  achieve 
logo  this.  Most  neurodegenerative  diseases  are  expected  to  influence  some  con- 

1097  nections  more  than  others,  generating  a  change  in  the  proportion  of  fibers 

1098  dedicated  to  each  connection,  when  compared  to  the  same  cortical  region  and 

1099  corresponding  connections  on  a  healthy  brain.  The  overall  network  analysis 

1100  framework  here  developed  is  currently  under  investigation  for  such  studies, 

1101  such  as  neurodenegeration  in  HIV  where  basal  ganglia,  motor  and  frontal 

1102  circuits  tend  to  be  more  greatly  impaired  than  others  (Thompson  et  al., 

1103  2005). 

1104  5.2.  Classification  using  Machine  Learning  Methods 

1105  Best  overall  classification  performance  was  obtained  using  the  normaliza- 

1106  tion  indicated  by  Equation  (3)  (sections  2  and  4.1).  With  this  normaliza- 

1107  tion,  we  classified  brain  connectivity  networks,  according  to  sex  and  kinship 
nos  classes,  with  high  accuracy,  based  on  the  raw  connectivity  matrices  and  their 

1109  associated  topological  metrics,  mainly  at  the  node-to-node  level.  In  particu- 

1110  lar,  the  edge  betweenness  and  the  generalized  communicability  matrix  were 
mi  powerful  for  this  task.  These  results  should  extend  well  to  unobserved  data, 
m2  as  evaluated  by  the  formal  10-fold  cross-validation  and  permutation  tests. 
m3  On  the  other  hand,  sex  and  kinship  classification  results  were  weak  using 

1114  topological  metrics  at  the  node  level.  This  makes  sense  due  to  the  large 

1115  variability  of  the  connectivity  matrices  that  live  in  a  very  high  dimensional 
me  space  (Mn2,n  =  70),  requiring  a  higher  number  of  features  at  the  node-to- 
1117  node  resolution. 

ms  We  cannot  numerically  compare  our  sex  and  kinship  machine  learning 
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1119  based  classification  results  with  previous  work,  since  to  the  best  of  our  knowl- 

1120  edge,  no  previous  work  has  performed  such  studies,  starting  from  the  raw 

1121  connectivity  matrices  or  associated  topological  metrics.15 

1122  A  key  advantage  in  achieving  the  classification  results  reported  here  was 
H23  provided  by  the  embedded  SVM-based  automatic  feature  selection  algorithm 
H24  (Section  3.1).  This  feature  selection  algorithm  evaluates  subgroups  of  fea- 
H25  tures,  eliminating  redundancies  and  identifying  features,  that  when  consid- 
H26  ered  individually  might  not  be  very  influential,  but  can  be  so  as  a  group. 
H27  The  number  of  features  selected  by  this  feature  selection  method  is  close  to 
H28  (but  lower  than)  the  number  of  samples.  This  hints  that  each  connectivity 
H29  matrix  provides  distinctive  features,  unobtainable  from  the  remaining  ones. 
ii3o  Therefore,  it  will  be  interesting  to  investigate,  as  we  increase  the  number  of 
mi  samples,  where  the  number  of  features  increases  to  a  point  where  it  saturates. 
H32  Of  interest,  also,  would  be  to  compare  ranking  versus  wrappers  feature 
H33  selection  methods;  in  combination  with  different  classifiers  such  as  logistic, 
H34  Bayesian,  neural  networks.  A  larger  study  should  be  conducted  to  test  these 
H35  classifiers  on  different  datasets  and  with  different  tractography  algorithms 
me  (see  Section  5.4  for  a  discussion). 

H37  5.3.  Hypothesis  Testing 
ms  5.3.1.  Sex  Differences 

H39  We  found  significant  statistical  differences,  due  to  sex,  in  the  mean  val- 
ii4o  ues  of  36  edges  in  the  connectivity  matrices.  In  line  with  prior  work,  we 

15  Of  course,  other  studies  focusing  on  sex  and  inheritance  differences  have  been  con¬ 
ducted  in  the  past,  as  mentioned  in  the  text  and  cited  in  the  bibliography. 
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1141  found  that  there  are,  on  average,  structural  brain  connectivity  differences 

1142  between  women  and  men.  In  particular,  women  have  higher  probability  of 

1143  inter-hemispheric  connections  than  men,  as  well  as  higher  probabilities  of 

1144  connections  on  both  hemispheres  (as  defined  on  Section  2),  with  some  ex- 

1145  ceptions  of  course  (Figure  4).  This  seems  to  suggest  that  on  average,  women 
H46  have  great  structural  connectivity  supporting  inter-hemispheric  communica- 
1147  tion  than  men.  The  higher  strength  of  the  connections  in  both  hemispheres 
H48  seems  to  suggest  that  the  communication  between  the  cortical  regions  as- 

1149  sociated  with  those  connections  is  slightly  better  supported  structurally  in 

1150  women  than  in  men. 

1151  We  must  point  out  here  however  that  these  differences  are  on  average. 

1152  Given  the  large  variability  of  brain  connectivity  networks,  we  can  always 

1153  find  individual  men  with  higher  connectivity  values  than  some  women,  e.g., 

1154  for  the  features  indicated  in  Figure  4  (and  Table  S10). 

1155  We  also  found  here  that  the  topological  metrics  mean  clustering  coeffi- 
H56  cient,  communicability  matrix,  and  edge  betweenness  centrality,  allow  us  to 
1157  distinguish  between  men  and  women.  In  particular,  the  mean  clustering  co¬ 
rns  efficient  is  higher  in  women  than  in  men,  especially  in  the  left  hemisphere 

1159  and  in  the  cortical  regions  indicated  in  Section  4.2.2.  On  average,  the  neigh- 

1160  borhood  of  these  cortical  regions  is  more  strongly  connected  for  women  than 
n6i  for  men.  We  also  find  that  women  have  a  statistically  significant  higher  edge 
H62  betweenness  centrality  metric  in  five  connections  (Section  4.2.2).  This  means 
H63  that  these  connections  are  more  frequently  used  on  shortest  path  communi- 
n64  cations  in  women  than  in  men.  Finally,  we  found  that  women  have  also 
lies  statistically  significant  higher  communicability  values  centered  on  the  inter- 
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hemispheric  connections  indicated  in  Section  4.2.2.  This  suggest  that  the 
inter-hemispheric  communication  is  stronger  in  women  than  in  men,  sup¬ 
porting  the  results  from  the  connectivity  matrices,  but  now  at  a  higher  scale 
that  includes  walks  of  any  length. 

Previous  results  on  structural  differences  in  the  brain  connectivity  ma¬ 
trix  (Jahanshad  et  al.,  2011)  and  some  topological  metrics  (different  from  the 
ones  used  here),  on  the  associated  graph  (Gong  et  al.,  2009),  agree  with  the 
results  of  this  work.  In  particular,  these  studies  indicate  that  women  have 
stronger  inter-hemispheric  connections  than  men  (Jahanshad  et  al.,  2011), 
that  women  show  greater  overall  cortical  connectivity,  and  that  the  underly¬ 
ing  organization  of  their  cortical  networks  is  more  efficient,  both  locally  and 
globally  (Gong  et  al.,  2009),  all  in  agreement  with  our  results.  We  arrived 
here  at  the  same  overall  conclusions  using  a  larger  number  of  high  quality 
HARDI  images,  a  larger  number  of  topological  metrics,  and  formal  control 
of  the  overall  FDR. 

5.3.2.  Kinship  Differences 

We  found  significant  statistical  differences  in  the  mean  distribution  of 
the  pair-wise  absolute  differences  in  the  connectivity  matrices  and  associated 
topological  metrics,  allowing  us  to  distinguish  among  the  kinship  classes  of 
identical  twins,  non  identical  twins,  non-twin  siblings,  and  unrelated  pairs  of 
individuals.  As  expected  from  a  genetically  influenced  trait,  these  differences 
increases  as  the  pair  of  subjects  are  less  and  less  related.  For  instance,  the 
structural  differences  between  identical  twins  and  non-identical  twins  are 
less  than  the  structural  differences  between  twins  and  non-twin  siblings.  We 
cannot  make  the  same  kind  of  comparisons  we  did  between  females  and  males, 
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1191  since  the  differences  reported  correspond  to  differences  among  classes,  where 

1192  each  class  is  constituted  by  within-class  pair-wise  differences.  The  differences 

1193  reported  here  were  made  explicitly  for  classification  purposes,  using  machine 
n94  learning  methods  and  hypothesis  testing. 

n95  Previous  and  complementary  studies  on  structural  brain  connectivity  dif- 
H96  ferences  due  to  inheritance  (Jahanshad  et  al.,  2010;  Thompson  et  ah,  2001) 
H97  cannot  be  directly  compared  with  our  results,  since  those  studies  do  not  work 
H98  directly  with  the  raw  connectivity  matrices. 

n99  Overall  the  sex  and  kinship  classification  performances  (with  automatic 

1200  feature  selection)  are  very  good  using  the  communicability  and  edge  be- 

1201  tweenness  topological  metrics,  but  slightly  inferior  to  using  the  connectivity 

1202  matrices  directly.  We  believe  that  the  reason  for  this  is  that  topological 

1203  metrics  are  at  a  higher  scale  and  offer  less  detail  than  edges. 

1204  5-4-  Dependence  on  the  Tractography  Algorithm 

1205  A  key  issue  in  the  repeatability  of  the  findings  of  any  study  on  struc- 

1206  tural  brain  differences  based  on  the  DWI-derived  connectivity  matrix,  is  the 

1207  (possible)  strong  dependence  on  the  tractography  algorithm,  and  the  pa- 

1208  rameters  used  for  such  algorithm.  Indeed,  this  study,  as  well  as  previous 

1209  studies  on  structural  brain  connectivity,  assume  that  the  number  of  path- 

1210  ways  connecting  any  pair  of  cortical  regions  have  been  correctly  identified  by 

1211  tractography.  Nevertheless,  tractography  results  can  vary  significantly  de- 

1212  pending  on  the  algorithm  and  its  parameters,  the  signal  to  noise  ratio  of  the 

1213  data,  and  registration  (see  for  instance  Hagmann  et  al.  2006;  Shimony  et  al. 

1214  2006).  In  particular,  simple  tensor-based  tractography  algorithms  produce 

1215  quite  different  results  from  ODF-based  models  (Hagmann  et  ah,  2006),  and 
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1216  even  the  most  sophisticated  tractography  algorithms  can  produce  different 

1217  results  when  different  parameters  are  employed. 

1218  Taking  into  account  this  caveat,  we  used  a  state-of-the-art  probabilistic 

1219  HARDI  tractography  algorithm  (Section  2),  performing  an  exhaustive  search 

1220  of  all  the  possible  anatomical  connections,  avoiding  thus  local  minima,  and 

1221  hence  being  robust  to  the  variability  with  respect  to  different  parameters. 

1222  The  results  presented  here,  as  well  as  previous  similar  studies,  are  subject  to 

1223  the  (unknown)  accuracy  of  the  tractography  algorithm,  and  thus  statistical 

1224  results  may  vary. 

1225  In  order  to  further  increase  the  confidence  on  our  results,  in  addition 

1226  to  the  ODF-based  probabilistic  tractography  algorithm  used  here,  we  tested 

1227  a  simpler,  less  robust  but  very  popular  tensor-based  tractography  algorithm 

1228  implemented  in  the  Trackvis  toolbox.16  We  do  not  report  in  detail  the  results 

1229  from  this  tractography,  since  in  general  probabilistic  tractography  algorithms 

1230  are  superior  (Hagmann  et  ah,  2006),  and  in  particular  the  one  used  here 

1231  (Aganj  et  ah,  2011).  Nevertheless,  we  now  briefly  discuss  how  the  results 

1232  using  this  tensor-based  tractography  model  compare  with  the  detailed  results 

1233  reported  in  Section  4.  Selected  snapshots  of  the  results  with  this  tractography 

1234  are  presented  in  the  supplementary  material,  figures  S6-S8. 

1235  Overall,  the  classification  accuracies  are  similar  using  both  tractography 

1236  models.  In  addition,  the  overall  sex  differences  are  qualitatively  the  same: 

1237  higher  inter-hemispheric  and  overall  within  hemisphere  connections  in  fe- 

1238  males  than  in  males.  We  also  obtained  statistically  significant  features  to 

16http:  /  /  trackvis.org/ 
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discriminate  all  the  kinship  classes  using  the  same  topological  metrics  indi¬ 
cated  before.  However,  the  particular  features  identified  as  significant  for 
classification,  and  using  hypothesis  testing,  are  different  for  both  tractogra- 
phy  algorithms.  This  is  clearly  not  a  failure  of  the  methodology  proposed 
here,  but  a  limitation  of  the  current  state-of-the-art  tractography  algorithms. 
Moreover,  the  lower  robustness  of  the  tensor-based  tractography  algorithms 
is  expected  to  lead  to  such  difference  in  selected  features,  since  for  exam¬ 
ple,  certain  less-complex  pathways  can  be  more  consistent  and  less  affected 
by  such  lower  tractography  performance.  Features  selected  by  ODF-based 
probabilistic  tractography  are  expected  to  be  more  reliable. 

While  the  methodology  here  proposed  is  expected  to  be  robust  to  small 
variations  in  the  connectivity  matrices,  it  can  certainly  be  affected  by  ar¬ 
tifacts  coming  from  tractography  or  other  sources  that  could  seriously  bias 
the  connectivity  matrices.  The  robustness  of  the  proposed  method  relies  in 
turn  on  the  robustness  of  the  feature  selection,  classification,  performance 
evaluation,  and  FDR  error  control  methods,  that  as  shown  in  the  Methods, 
have  strong  theoretical  and  practical  foundations. 

5.5.  FDR  Error  Control 

There  is  a  general  consensus  in  the  scientific  community  that  the  FDR 
must  be  controlled  when  multiple  hypotheses  are  being  tested  on  the  same 
data.  There  is  however  no  general  agreement  on  how  to  control  the  FDR  when 
multiple  families  of  hypotheses  are  tested  along  the  same  line  of  research. 
As  shown  in  Section  4.2,  a  strict  FDR  error  control  on  multiple  families  of 
hypotheses  can  significantly  reduce  the  number  of  null-hypotheses  that  are 
rejected,  hence,  the  making  of  more  discoveries. 
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1264  This  is  an  issue  that  has  been  seriously  addressed  recently,  especially  in 

1265  gene  expression  studies,  where  multiple  families  of  thousands  of  hypotheses 

1266  must  be  tested  on  each  gene  (Yekuticli,  2008).  We  combined  the  screening 

1267  method  proposed  by  Rubinov  and  Sporns  2010;  Bullmore  and  Bassett  2010; 

1268  Achard  and  Bullmore  2007;  Bassett  et  al.  2008,  and  the  ANOVA  F-ratio  test, 

1269  to  reduce  the  number  of  uninteresting  null-hypotheses,  with  the  novel  hier- 

1270  archical  approach  of  Yekuticli  2008;  Benjamini  and  Yekuticli  2005;  Yekuticli 

1271  et  al.  2006,  to  control  the  FDR,  increasing  thus  the  statistical  power  when 

1272  compared  to  a  naive  overall  FDR  error  control.  In  spite  of  this,  we  can  not 

1273  reject  any  null-hypothesis  on  the  kinship  classes,  at  the  topological  global 

1274  level,  and  only  one  of  the  hypotheses  tested  at  this  level  was  significant  for 

1275  sex  differences.  We  could  have  dropped  the  control  of  the  overall  FDR  error 

1276  considering  that  is  was  too  strict,  but  did  not,  because  that  undermines  the 

1277  essence  of  the  FDR  error  control.  Indeed,  the  same  reason  why  we  must  con- 

1278  trol  the  false  discovery  rate  on  single  families  of  hypotheses  testing,  subsists 

1279  on  multiple  families  of  hypotheses  testing  (on  the  same  research  line):  the 

1280  higher  the  number  of  hypotheses  being  tested  on  the  same  data,  the  higher 

1281  the  probability  of  rejecting  null-hypotheses  by  chance,  especially,  when  most 

1282  of  the  null-hypotheses  are  true  or  can  barely  be  rejected  either  individually 

1283  or  at  the  family  level. 

1284  There  is  however  a  need  for  less  conservative  FDR  error  control,  especially 

1285  when  the  expected  proportion  of  true  null- hypotheses  is  high,  i.e.,  we  expect 

1286  few  true  discoveries  among  many  true  null-hypotheses.  The  high  number  of 

1287  individuals  considered  here  improve  the  accuracy  of  the  estimated  distribu¬ 
te  tion  of  the  mean  (via  bootstrapping).  However,  the  FDR  error  control  is 
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1289  blind  to  this,  since  the  number  of  hypotheses  being  tested  depends  only  on 

1290  the  number  of  features  at  each  scale  (see  Methods),  which,  in  our  case,  can 

1291  be  0(n2),  n  being  the  number  of  nodes  in  the  network.  The  FDR  error  con- 

1292  trol  penalizes  all  the  same  smaller  and  larger  studies.  Further  studies  should 

1293  be  conducted  to  make  the  FDR  error  control  less  conservative,  especially,  on 

1294  larger  population  studies. 

1295  6.  Conclusion 

1296  In  this  large  scale  HARDI  study  of  303  individuals,  we  introduced  a  unify- 

1297  ing,  robust  and  general  method  to  investigate  brain  connectivity  differences 

1298  among  individuals  (including  pairs  of  individuals)  using  machine  learning 

1299  and  hypothesis  testing  methods.  We  also  reported  differences  among  groups 

1300  or  classes  of  individuals  using  multiple  hypotheses  tests  at  several  levels  of 

1301  data  hierarchy. 

1302  We  considered  both:  raw  connectivity  matrices  and  derived  topological 

1303  metrics,  at  multiple  levels:  global,  single  node,  and  node-to-node.  Feature 

1304  selection  using  a  wrapper  (or  embedded  method)  was  critical  to  eliminate,  for 

1305  classification  purposes,  uninformative  connections  in  the  connectivity  matrix 

1306  or  topological  metrics  on  the  associated  digraphs. 

1307  Future  work  will  focus  on  metrics  at  different  scales  and  at  the  highest 

1308  resolution  scale  (as  was  done  with  the  connectivity  matrices).  The  study  will 

1309  also  be  extended  to  larger  datasets,  permitting  other  kinds  of  genetic  studies, 

1310  and  to  denser  connectivity  matrices  derived  from  various  tractography  meth- 

1311  ods.  Of  great  interest  is  a  formal  study  of  the  sensitivity  of  classification, 

1312  feature  selection,  and  multiple  hypotheses  testing  to  the  tractography  model. 


59 


1313 


Acknowledgments 


1314  Work  partially  supported  by  NIH  P41  RR008079,  NIH  P30  NS057091,  NIH 

1315  R01  EB008432,  ONR,  NGA,  NSF,  NSSEFF/AFOSR,  and  ARO.  NJ  was 
me  additionally  supported  by  NIH  NLM  Grant  T15  LM07356.  This  study  was 
1317  supported  by  grant  number  ROl  HD050735  from  the  National  Institute  of 
ms  Child  Health  and  Human  Development,  USA,  and  Project  Grant  496682  from 
ms  the  National  Health  and  Medical  Research  Council,  Australia.  Additional 

1320  support  for  algorithm  development  was  provided  by  the  NIA,  NIBIB,  and  the 

1321  National  Center  for  Research  Resources  (AG016570,  EB01651,  RR019771  to 

1322  PT).  The  authors  would  like  to  thank  the  feedback  provided  by  Dr.  Daniel 

1323  Yekutieli  in  the  correct  interpretation  of  the  hierarchical  control  of  the  FDR 

1324  and  also  Dr.  Ernesto  Estrada  for  his  feedback  on  the  correct  interpretation 

1325  of  the  communicability  matrix  for  directed  graphs,  and  for  providing  us  with 

1326  further  bibliography  in  the  subject.  We  are  also  grateful  to  the  twins  for 

1327  their  willingness  to  participate  in  our  studies,  and  research  nurses,  Marlene 

1328  Grace  and  Ann  Eldridge,  Queensland  Institute  of  Medical  Research,  for  twin 

1329  recruitment. 

1330  References 

1331  Abramovich,  F.,  Benjamini,  Y.,  1996.  Adaptive  thresholding  of  wavelet  co- 

1332  efficients.  Comput.  Stat.  Data  An.  22,  351-361. 

1333  A  chard,  S.,  Bullmore,  E.  T.,  2007.  Efficiency  and  cost  of  economical  brain 

1334  functional  networks.  PLoS  Comput.  Biol.  3  (el7). 


60 


1335  Aganj,  I.,  Lenglet,  C.,  Jahanshad,  N.,  Yacoub,  E.,  Harel,  N.,  Thompson, 

1336  P.  M,,  Sapiro,  G.,  2011.  A  hough  transform  global  probabilistic  approach 

1337  to  multiple-subject  diffusion  mri  tractography.  Med.  Image  Anal.  Epub 

1338  ahead  of  print  2011  Jan  26. 

1339  Amaldi,  E.,  Kami,  V.,  1998.  On  the  approximation  of  minimizing  non  zero 

1340  variables  or  unsatisfied  relations  in  linear  systems.  Theoretical  Computer 

mi  Science  209,  237-260. 

1342  Apostolova,  L.,  Thompson,  P.  M.,  2008.  Mapping  progressive  brain  struc- 

1343  tural  changes  in  early  alzheimer’s  disease  and  mild  cognitive  impairmen. 

1344  Neuropsycologia  46  (6),  1597-1612. 

1345  Basser,  P.  J.,  Pierpaoli,  C.,  1996.  Microstructural  and  physiological  features 

1346  of  tissues  elucidated  by  quantitative-diffusion-tensor  mri.  J.  Magn.  Reson. 

1347  111  (3),  209-219. 

ms  Bassett,  D.  S.,  Brown,  J.  A.,  Deshpande,  V.,  Carlson,  J.  M.,  Grafton,  S., 

1349  2011.  Conserved  and  variable  architecture  of  human  white  matter  connec- 

1350  tivity.  Neuroimage  54  (2),  1262-1279. 

1351  Bassett,  D.  S.,  Bullmore,  E.  T.,  Verchinski,  B.  A.,  Mattay,  V.  S.,  Weinberger, 

1352  D.  R.,  Meyer-Lindenberg,  A.,  2008.  Hierarchical  organization  of  human 

1353  cortical  networks  in  health  and  schizophrenia.  J.  Neurosci.  28  (37),  9239- 

1354  9248. 

1355  Bassett,  D.  S.,  Greenfield,  D.  L.,  Meyer-Lindenberg,  A.,  Weinberger,  D.  R., 

1356  Moore,  S.  W.,  Bullmore,  E.  T.,  2010.  Efficient  physical  embedding  of  topo- 


61 


1357  logically  complex  information  processing  networks  in  brains  and  computer 

1358  circuits.  PLoS  Comput.  Biol.  6  (4),  el000748. 

1359  Behrens,  T.  E.  J.,  Berg,  H.  J.,  Jbabdi,  S.,  Rushworth,  M.  F.  S.,  Woolrich, 

1360  M.  W.,  2007.  Probabilistic  diffusion  tractography  with  multiple  fibre  ori- 

1361  entations:  What  can  we  gain?  Neuroimage  34  (1),  144-55. 

1362  Benjamini,  Y.,  Heller,  R.,  Yekuticli,  D.,  2009.  Selective  inference  in  complex 

1363  research.  Philos.  Trans.  R.  Soc.  Loncl.  B.  Biol.  Sci.  367  (1906),  4255-4271. 

1364  Benjamini,  Y.,  Hochberg,  Y.,  1995.  Controlling  the  false  discovery  rate:  a 

1365  practical  and  powerful  approach  to  multiple  testing.  J.  Roy.  Stat.  Soc.  B. 

me  Met.  57  (1),  289-300. 

1367  Benjamini,  Y.,  Hochberg,  Y.,  2000.  On  the  adaptive  control  of  the  false 
nee  discovery  rate  in  multiple  testing  with  independent  statistics.  J.  Educ. 
ms  Behav.  Stat.  25  (1),  60-83. 

1370  Benjamini,  Y.,  Yekuteli,  D.,  2001.  The  control  of  the  false  discovery  rate  in 

1371  multiple  testing  under  dependency.  Ann.  Statist.  29  (4),  1165-1188. 

1372  Benjamini,  Y.,  Yekuteli,  D.,  2005.  False  discovery  rate-adjusted  multiple 

1373  confidence  intervals  for  selected  parameters.  J.  Am.  Stat.  Assoc.  100,  71- 

1374  8 1 . 

1375  Benjamini,  Y.,  Yekutieli,  D.,  2005.  Quantitative  trait  loci  analysis  using  the 

1376  false  discovery  rate.  Genetics  171  (2),  783-790. 

1377  Blondel,  V.,  Guillaume,  J.,  Lambiotte,  R.,  Lefebvre,  E.,  2008.  Fast  unfolding 

1378  of  communities  in  large  networks.  J.  Stat.  Mech.,  P1008. 


62 


1379  Boccaletti,  S.,  Latorab,  V.,  Moreno,  Y.,  Chavez,  M.,  Hwanga,  D.-U.,  2006. 

1380  Complex  networks:  Structure  and  dynamics.  Phys.  Rep.  424  (4-5),  175- 

1381  308. 

1382  Brin,  S.,  Page,  L.,  1998.  The  anatomy  of  a  large-scale  hypertextual  web 

1383  search  engine.  In:  Publishers,  E.  S.  (Ed.),  Proc.  Inti.  Conf.  World  Wide 

1384  Web.  Vol.  30.  pp.  1-7. 

was  Bullmore,  E.  T.,  Bassett,  D.  S.,  2010.  Brain  graphs:  Graphical  models  of  the 
use  human  brain  connectome.  Annu.  Rev.  Clin.  Psycho.  Epub  ahead  of  print 

1387  2010  April  5. 

1388  Bullmore,  E.  T.,  Sporns,  O.,  2009.  Complex  brain  networks:  graph  theo- 

1389  retical  analysis  of  structural  and  functional  systems.  Nat.  Rev.  Neurosci. 

1390  10  (3),  186-198. 

1391  Crofts,  J.  J.,  Higham,  D.  J.,  2009.  A  weighted  communicability  measure 

1392  applied  to  complex  brain  networks.  J.  R.  Soc.  Interface  6  (33),  411-414. 

1393  Davison,  R.,  MacKinnon,  J.  G.,  1999.  The  size  distortion  of  boostrap  tests. 

1394  Vol.  15  of  Econometric  Theory.  Cambridge  University  Press. 

1395  de  Boer,  R.,  Schaap,  M.,  van  der  Lijn,  F.,  Vrooman,  H.  A.,  de  Groot,  M., 

1396  van  der  Lugt,  A.,  Ikram,  M.  A.,  Vernooij,  M.  W.,  Breteler,  M.  M.,  Niessen, 

1397  W.  J.,  2011.  Statistical  analysis  of  minimum  cost  path  based  structural 

1398  brain  connectivity.  Neuroimage  55  (2),  557-565. 

1399  Dosenbach,  N.  U.  F.,  Nardos,  B.,  Cohen,  A.  L.,  Fair,  D.  A.,  Power,  J.  D., 
moo  Church,  J.  A.,  Nelson,  S.  M.,  Wig,  G.  S.,  Vogel,  A.  C.,  Lessov-Schlaggar, 


63 


1401 

1402 

1403 

1404 

1405 

1406 

1407 

1408 

1409 

1410 

1411 

1412 

1413 

1414 

1415 

1416 

1417 

1418 

1419 

1420 

1421 

1422 

1423 


C.  N.,  Barnes,  K.  A.,  Dubis,  J.  W.,  Feczko,  E.,  Coalson,  R.  S.,  Pruett, 
J.  R.,  Bareli,  D.  M.,  Petersen,  S.  E.,  Schlaggar,  B.  L.,  2010.  Prediction  of 
individual  brain  maturity  using  fmri.  Science  329  (5997),  1358-1361. 

Duda,  R.  O.,  Hart,  P.  E.,  1972.  Use  of  the  hough  transformation  to  detect 
lines  and  curves  in  pictures.  Commun.  ACM  15  (1). 

Easley,  D.,  Kleinberg,  J.,  2010.  Networks,  Crowds,  and  Markets:  Reasoning 
about  a  Highly  Connected  World.  Cambridge  University  Press. 

Estrada,  E.,  2010.  Generalized  walks-based  centrality  measures  for  complex 
biological  networks.  J.  Theor.  Biol.  263  (4),  556-565. 

Estrada,  E.,  Higham,  D.  J.,  2010.  Network  properties  revealed  through  ma¬ 
trix  functions.  SIAM  Review  52  (4),  696-714. 

Fischl,  B.,  van  der  Kouwe,  A.,  Destrieux,  C.,  Halgren,  E.,  Segonne,  F.,  Salat, 

D.  H.,  Busa,  E.,  Seidman,  L.  J.,  Goldstein,  J.,  Kennedy,  D.,  Caviness, 
V.,  Makris,  N.,  Rosen,  B.,  Anders  M.  Dale,  A.  M.,  2004.  Automatically 
parcellating  the  human  cerebral  cortex.  Cereb.  Cortex  14  (1),  11-22. 

Fisher,  H.,  2011.  A  History  of  the  Central  Limit  Theorem.  From  Classical  to 
Modern  Probability  Theory,  1st  Edition.  Springer.  ISBN  978-0-387-87856- 
0. 

Fornito,  A.,  Zalesky,  A.,  Bassett,  D.  S.,  Meunier,  D.,  Ellison- Wright,  I.,  Yu, 
M.,  Wood,  S.  J.,  Shaw,  K.,  O’Connor,  J.,  Nertney,  D.,  Mowry,  B.  J., 
Pantclis,  C.,  Bullmore,  E.  T.,  2011.  Genetic  influences  on  cost-efficient 
organization  of  human  cortical  functional  networks.  J.  Neurosci.  31  (9), 
3261-3270. 


64 


1424 

1425 

1426 

1427 

1428 

1429 

1430 

1431 

1432 

1433 

1434 

1435 

1436 

1437 

1438 

1439 

1440 

1441 

1442 

1443 

1444 

1445 


Gigandet,  X.,  Hagmann,  P.,  Kurant,  M.,  Cammoun,  L.,  Meuli,  R.,  Thiran, 
J.-P.,  2008.  Estimating  the  confidence  level  of  white  matter  connections 
obtained  with  mri  tractography.  PLoS  ONE  3  (12),  e4006. 

Gong,  G.,  Rosa-Neto,  P.,  Carbonell,  F.,  Chen,  Z.  J.,  He,  Y.,  Evans,  A.  C., 
2009.  Age-  and  gender- related  differences  in  the  cortical  age-  and  gender- 
related  differences  in  the  cortical  anatomical  network.  J.  Neurosci.  29  (50), 
15684-15693. 

Gonzales,  R.  C.,  Woods,  R.  E.,  2008.  Digital  Image  Processing,  3rd  Edition. 
Prentice  Hall. 

Guyon,  1.,  Eliseeff,  A.,  2003.  An  introduction  to  variable  and  feature  selec¬ 
tion.  J.  Mach.  Learn.  Res.  3,  1157-1182. 

Guyon,  1.,  Weston,  J.,  Barnhill,  S.,  Vapnik,  V.,  2002.  Gene  selection  for 
cancer  classification  using  support  vector  machines.  Mach.  Learn.  46  (1- 
3),  389-422. 

Hagmann,  P.,  Cammoun,  L.,  Gigandet,  X.,  Meuli,  R.,  Honey,  C.  J.,  Wedeen, 
V.  J.,  Sporns,  O.,  2008.  Mapping  the  structural  core  of  human  cerebral 
cortex.  PLoS  Biology  6  (7),  el59. 

Hagmann,  P.,  Jonasson,  L.,  Maeder,  P.,  Thiran,  J.-P.,  Wedeen,  V.  J.,  Meuli, 
R„,  Oct.  2006.  Understanding  diffusion  mr  imaging  techniques:  From  scalar 
diffusion-weighted  imaging  to  diffusion  tensor  imaging  and  beyond.  Radio- 
Graphics  26,  S205-S223. 

Hagmann,  P.,  Kurant,  M.  andGigandet,  X.,  Thiran,  P.,  Wedeen,  V.  J.,  Meuli, 


65 


1446 


1447 

1448 

1449 

1450 

1451 

1452 

1453 

1454 

1455 

1456 

1457 

1458 

1459 

1460 

1461 

1462 

1463 

1464 

1465 

1466 

1467 


R.,  Thiran,  J.-T.,  2007.  Mapping  human  whole-brain  structural  networks 
with  diffusion  mri.  PLoS  ONE  2  (7),  e597. 

Hartmann,  W.  M.,  2006.  Dimension  Reduction  vs.  Variable  Selection.  Vol. 
3732  of  Lecture  Notes  in  Computer  Science.  Springer,  pp.  931-938. 

He,  Y.,  Chen,  Z.  J.,  Evans,  A.  C.,  2007.  Small-world  anatomical  networks 
in  the  human  brain  revealed  by  cortical  thickness  from  mri.  Cereb.  Cortex 
17  (10),  2407-2419. 

Holmes,  C.  J.,  Hoge,  R.,  Collins,  L.,  Woods,  R.,  Toga,  A.  W.,  Evans,  A.  C., 
1998.  Enhancement  of  mr  images  using  registration  for  signal  averaging. 
J.  Comput.  Assist.  Tornogr.  22  (2),  324-333. 

Iturria-Medina,  Y.,  Canales- Rodriguez,  E.,  Mclic-Garcia,  L.,  Valdes- 
Hernandez,  P.,  Martmez-Montes,  E.,  Aleman-Gomez,  Y.,  Sanchez-Bornot, 
J.  M.,  2007.  Characterizing  brain  anatomical  connections  using  diffusion 
weighted  mri  and  graph  theory.  Neuroimage  36  (3),  645-660. 

Jahanshad,  N.,  Aganj,  I.,  Lenglet,  C.,  Joshi,  A.,  Jin,  Y.,  Barysheva,  M., 
McMahon,  K.,  de  Zubiricaray,  G.,  Martin,  N.,  Wright,  M.,  Toga,  A.  W., 
Sapiro,  G.,  Thompson,  P.  M.,  2011.  Sex  differences  in  the  human  con- 
nectome:  4-tesla  high  angular  resolution  diffusion  imaging  (hardi)  trac- 
tography  in  234  young  adult  twins.  In:  Proc.  IEEE  Int.  Syrnp.  Biorned. 
Imaging. 

Jahanshad,  N.,  Lee,  A.  D.,  Barysheva,  M.,  McMahon,  K.  L.,  de  Zubicaray, 
G.  I.,  Martin,  N.  G.,  Wright,  M.  J.,  Toga,  A.  W.,  Thompson,  P.  M.,  2010. 


66 


1468 

1469 

1470 

1471 

1472 

1473 

1474 

1475 

1476 

1477 

1478 

1479 

1480 

1481 

1482 

1483 

1484 

1485 

1486 

1487 

1488 

1489 


Genetic  influences  on  brain  asymmetry:  A  dti  study  of  374  twins  and 
siblings.  Neuroimage  52  (2),  455-469. 

Jensen,  D.  D.,  Cohen,  P.  R.,  2000.  Multiple  comparisons  in  induction  algo¬ 
rithms.  Mach.  Learn.  38,  309-338. 

Kriegeskorte,  N.,  Simmons,  W.  K.,  Bellgowan,  P.  S.  F.,  Bake,  C.  I.,  2009. 
Circular  analysis  in  systems  neuroscience:  the  dangers  of  double  clipping. 
Nat.  Neurosci.  12,  535-540. 

Leonard,  C.  M.,  Towler,  S.,  Welcome,  S.,  Halderman,  L.  K.,  Otto,  R.,  Eckert, 
M.  A.,  Chiarello,  C.,  2008.  Size  matters:  Cerebral  volume  influences  sex 
differences  in  neuroanatomy.  Cereb.  Cortex  18  (12),  2920-2931. 

Leow,  A.,  Huang,  S.-C.,  Geng,  A.,  Becker,  J.,  Davis,  S.,  Toga,  A.  W.,  Thomp¬ 
son,  P.  M.,  2005.  Inverse  Consistent  Mapping  in  3D  Deformable  Image  Reg¬ 
istration:  Its  Construction  and  Statistical  Properties.  Vol.  3565  of  Lecture 
Notes  in  Computer  Science.  Springer- Verlag,  pp.  23-57. 

Lohmann,  G.,  Margulies,  D.  S.,  Horstmann,  A.,  Pleger,  B.,  Lepsien,  J.,  Gold- 
hahn,  D.,  Schloegl,  H.,  Stumvoll,  M.,  Villringer,  A.,  Turner,  R.,  2010. 
Eigenvector  centrality  mapping  for  analyzing  connectivity  patterns  in  frnri 
data  of  the  human  brain.  PLoS  ONE  5  (4),  el0232. 

Ojala,  M.,  Garriga,  G.  C.,  2010.  Permutation  tests  for  studying  classifier 
performance.  J.  Mach.  Learn.  Res.  11,  1833-1863. 

Onnela,  J.  P.,  Saramaki,  J.,  Kertesz,  J.,  Kaski,  K.,  2005.  Intensity  and  coher¬ 
ence  of  motifs  in  weighted  complex  networks.  Phys.  Rev.  E  71  (6),  065103. 


67 


1490  Refaeilzadeh,  P.,  Tang,  L.,  Liu,  H.,  2009.  Cross  Validation.  Encyclopedia  of 
U9i  Database  Systems.  Springer. 

U92  Reiner- Benaim,  A.,  2007.  Fdr  control  by  the  bh  procedure  for  two-sided 

1493  correlated  tests  with  implications  to  gene  expression  data  analysis.  Biom. 

1494  J.  49  (1),  107-126. 

U95  Reiner-Benaim,  A.,  Yekuteli,  D.,  Letwin,  N.  E.,  Elmer,  G.  I.,  Lee,  N.  H., 
U96  Kafkafi,  N.,  Benjamini,  Y.,  2007.  Associating  quantitative  behavioral  traits 
U97  with  gene  expression  in  the  brain:  searching  for  diamonds  in  the  hay. 
M98  Bioinformatics  23  (17),  2239-2246. 

U99  Richiardi,  J.,  Eryilmaz,  H.,  Schwartz,  S.,  Vuilleumier,  P.,  Van  De  Ville,  D., 

1500  2010.  Decoding  brain  states  from  fmri  connectivity  graphs.  Neuroimage 

1501  Epub  ahead  of  print  2010  June  9. 

1502  Rubinov,  M.,  Bassett,  D.  S.,  2011.  Emerging  evidence  of  connectomic  abnor- 

1503  malities  in  schizophrenia.  Neuroscience  In  press. 

1504  Rubinov,  M.,  Sporns,  O.,  2010.  Complex  network  measures  of  brain  connec- 

1505  tivity:  Uses  and  interpretations.  Neuroimage  52  (3),  1059-1069. 

1506  Shepelyansky,  D.  L.,  Zhirov,  O.  V.,  2010.  Towards  google  matrix  of  brain. 

1507  Phys.  Lett.  A  374,  3206-3209. 

1508  Shimony,  J.,  Burton,  H.,  Epstein,  A.  A.,  McLaren,  D.  G.,  Sun,  S.  W.,  Sny- 

1509  der,  A.  Z.,  November  2006.  Diffusion  tensor  imaging  reveals  white  matter 

1510  reorganization  in  diffusion  tensor  imaging  reveals  white  matter  reorgani- 

1511  zation  in  diffusion  tensor  imaging  reveals  white  matter  reorganization  in 
early  blind  humans.  Cereb.  Cortex  16,  1653-1661. 


1512 


1513 


Sporns,  O.,  Kotter,  R.,  2004.  Motifs  in  brain  networks.  PLoS  Biol.  2,  e369. 


1514  Storey,  J.  D.,  2002.  A  direct  approach  to  false  discovery  rates.  J.  R.  Statist, 
ms  Soc.  B  64  (3),  479-498. 

1516  Storey,  J.  D.,  Taylor,  J.  E.,  Siegmund,  D.,  2004.  Strong  control,  conserva- 

1517  tive  point  estimation  and  simultaneous  conservative  consistency  of  false 

1518  discovery  rates:  a  unified  approach.  J.  R.  Statist.  Soc.  B  66  (1),  187-205. 

ms  Thomason,  M.  E.,  Dennis,  E.  L.,  Joshi,  A.  A.,  Joshi,  S.  H.,  D.,  D.  I.,  Chang, 

1520  C.,  Henry,  M.  L.,  Johnson,  R.  F.,  Thompson,  P.  M.,  Toga,  A.  W.,  Glover, 

1521  G,  H.,  Van  Horn,  J.  D.,  Gotlib,  I.  H.,  2011.  Resting-state  frnri  can  reliably 

1522  map  neural  networks  in  children.  Neuroimage  55  (1),  165-75. 

1523  Thompson,  P.  M.,  Cannon,  T.  D.,  Narr,  K.  L.,  van  Erp,  T.,  Poutanen,  V.-P., 

1524  Huttunen,  M.,  Lonnqvist,  J.,  Standertskjold-Nordenstam,  C.-G.,  Kaprio, 

1525  J.,  Khaledy,  M.,  Dail,  R.,  Zoumalan,  C.  I.,  Toga,  A.  W.,  2001.  Genetic 

1526  influences  on  brain  structure.  Nat.  Neurosci.  4  (12),  1253-1258. 

1527  Thompson,  P.  M.,  Dutton,  R.  A.,  Hayashi,  K.  M.,  Toga,  A.  W.,  Lopez,  O.  L., 

1528  Aizenstein,  H.  J.,  Becker,  J.  T.,  2005.  Thinning  of  the  cerebral  cortex  in 

1529  hiv/aids  reflects  cd4+  t-lymphocyte  decline.  In:  Proc.  Nat.  Acad.  Sci.  Vol. 

1530  102.  pp.  15647-15652. 

1531  Thompson,  P.  M.,  Hayashi,  K.  M.,  de  Zubiricaray,  G.,  Janke,  A.  L.,  Rose, 

1532  S.  E.,  Semple,  J.,  Herman,  D.,  Hong,  M.  S.,  Dittmer,  S.,  Doddrcll,  D.  M., 

1533  Toga,  A.  W.,  2003.  Dynamics  of  gray  matter  loss  in  alzheimer’s  disease.  J. 

1534  Neurosci.  23  (3),  994-1005. 


69 


1535 


Tuch,  D.  S.,  Dec.  2004.  Q-ball  imaging.  Magn.  Reson.  Med.  52,  1358-1372. 


1536  Vapnik,  V.  N.,  1998.  Statistical  Learning  Theory.  Wiley-Interscience. 

1537  Westfall,  P.  H.,  Johnson,  W.  O.,  Utts,  J.  M.,  1997.  Bayesian  perspective  on 

1538  the  bonferroni  adjustment.  Biometrika  84  (2),  419-427. 

1539  Winer,  B.  J.,  1971.  Statistical  Principles  in  Experimental  Design,  2nd  Edi- 

1540  tion.  Me  Graw-Hill,  Inc. 

1541  Yekutieli,  D.,  2008.  Hierarchical  false  discovery  rate  controlling  methodology. 

1542  J.  Arner.  Statistical  Assoc.  103  (481),  309-316. 

1543  Yekutieli,  D.,  Reiner-Benaim,  A.,  Benjamini,  Y.,  Elmer,  G.  I.,  Kafkafi,  N., 

1544  Letwin,  N.  E.,  Lee,  N.  H.,  2006.  Approaches  to  multiplicity  issues  in  com- 

1545  plex  research  in  microarray  analysis.  Stat.  Neerl.  60  (4),  414-437. 


1546  Appendix 

1547  Additio7ial  Implementation  Details 

1548  We  used  the  publicly  available  implementations  of  topological  metrics  in 

1549  the  Brain  Connectivity  Toolbox  (BCT),17  that  works  with  weighted  directed 

1550  graphs.  Newer  metrics  such  as  the  PageRank  and  centrality  and  communi- 

1551  cability  measures,  based  on  subgraphs,  are  not  available  in  the  BCT  toolbox. 

1552  Nevertheless,  a  free  implementation  of  the  PageRank  can  be  found  on  the 

17https://sites. google. com/a/brain-connectivity-toolbox. net/bct/Home 
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1553  web,18  and  Ernesto’s  centrality  and  communicability  measures  can  be  easily 

1554  obtained  using  the  new  matrix  exponential  function  (exprn)  in  Matlab.19 

1555  In  this  work,  we  use  the  Waikato  Environment  for  Knowledge  Analysis 

1556  (weka)  data  mining  software,20  which  provides  feature  selection,  classihca- 

1557  tion,  regression  and  n-fold  cross-validation  tools.21  Permutation  tests  were 

1558  implemented  in  JAVA  using  the  weka,  libsvrn,22  and  Java  Statistical  Classes23 

1559  (jsc)  libraries.  The  permutation  tests  consist  on  training  the  classifier  with 

1560  the  selected  features  and  10-fold  cross-validation,  over  1,000  random  per- 

1561  mutations  of  the  data  set  labels,  in  order  to  generate  the  null-hypothesis 

1562  distribution.  Since,  the  computed  p-values  of  the  permutation  tests  strongly 

1563  depends  on  the  performance  of  the  classification  being  tested  (Ojala  and 

1564  Garriga,  2010),  we  used  the  average  of  the  classification  performance  over 

1565  1,000  different  random  splittings  of  the  data  set.24  In  addition,  the  clas- 

1566  sihcation  performance  is  not  evaluated  using  a  single  parameter.  We  used 

1567  here  overall  classification  accuracy,  Balanced  Error  Rate  (BER)25  area  under 
wee  the  Receiver  Operating  Characteristic  (ROC),  kappa  statistic,  and  confusion 
1569  matrices. 

18http://read. pudn.com/downloadsl49/sourcecode/math/642925/pagerank. m__.htm  or 

http:  / /www. levmuchnik.net/Content/Networks/NetworkPackageReference.htn1l7/:  Algorithms 
19http://www. mathworks.com/help/techdoc/ref/expm.html 
20http:  / /www.cs. waikato.  ac.nz/ml/weka/ 

21  Alternatively,  the  rapidMiner  package  provides  multithreading  and  more  flexibility 

than  weka,  at  the  expense  of  a  steeper  learning  curve. 

2 2 ht t p : / / www .csie.ntu.edu.tw/  cjlin/ libsvrn / 

23http:  / /www  .jsc. nildram.co.uk/ 

24This  is  achieved  in  weka  by  changing  at  random  the  seed. 

25Chosen  in  the  NIPS  2003  feature  selection  challenge  as  the  main  judging  criterion. 
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1570  In  general,  classifier  performance  can  be  biased  due  to  large  differences 

1571  in  the  number  of  samples  for  each  class.  The  weka  toolbox  allows  the  use 

1572  of  a  weight  to  compensate  for  the  differences  in  the  number  of  samples. 

1573  Nevertheless,  this  weight  did  not  produce  significant  classification  differences 

1574  as  compared  to  the  unweighted  samples,  as  SVMs  are  less  dependent  on 

1575  sample  size,  because  they  rely  on  a  few  support  vectors. 

1576  Single  Effects  F -ratio 

1577  Here,  we  will  refer  to  populations,  factors  and  treatments  as  it  is  usual  in 

1578  experimental  design.  The  population  here  refers  to  the  bootstrapped  mean 

1579  differences,  due  to  sex  for  instance.  Factors  refer  here  to  sex  differences 

1580  measured  by  each  one  of  the  topological  metrics  considered  (Section  3.2, 

1581  Figure  1),  while  treatments  refer  to  the  differences  on  each  node  or  node  to 

1582  node  that  produce  differences  in  the  mean  value  of  the  topological  metric  at 

1583  those  scales.  For  instance,  a  factor  is  the  clustering  difference  (measured  by 

1584  the  clustering  coefficient)  due  to  sex,  while  the  treatments  correspond  to  the 

1585  clustering  differences  on  each  node  that  lead  to  differences  in  the  clustering 

1586  coefficient  on  each  node.  Here,  we  use  single  factor  ANOVA  F-ratios  to 

1587  screen  out  treatments  that  are  not  statistically  significant. 

1588  The  single  effects  F-ratio  is  computed  as  the  ratio  of  the  mean  square 

1589  treatment  (main)  effect  and  the  mean  square  (variance  within)  treatment 

1590  error  (Winer,  1971), 

F  _  Mean  Squaretreatment  j  _  (dL  -  d,,)2 
Mean  Squareerror  ;  Ej-Kj-'F)2  ’ 

B—l 

1591  where  dij  are  the  observed  differences  at  the  ith  node  or  node  to  node  i  = 
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1592  1  and  jth  bootstrapped  sample  j  =  1 , . . . ,  B,  di .  the  mean  value  of 

1593  the  bootstrapped  samples  at  i,  and  d,,,  the  overall  population  mean.  Now, 

1594  F-ratios  where  F*  >  F (q,i,B-i),  being  F  the  F-distribution,  are  considered 

1595  statistically  significant  at  the  error  control  level  q. 

1596  The  usual  ANOVA  F-ratios  divide  main  effects  by  the  pooled  experimen- 

1597  tal  error,  assuming  that  error  variances  (within  treatment  variability)  are  all 

1598  equal,  which  is  a  strong  assumption  not  usually  met  in  practice.  The  F-ratio 

1599  used  here  allows  differences  in  the  experimental  error  on  each  treatment, 
leoo  This  implies  that  this  F-ratio  does  not  follow  exactly  an  F-distribution, 
leoi  however,  the  sampling  distribution  of  these  F-ratios  can  be  approximated 

1602  by  the  F-distribution  (Winer,  1971).  In  addition,  ANOVA  F-ratios  also 

1603  assume  independence  (no  interaction)  on  each  treatment.  In  general,  this 

1604  independence  is  not  met  in  our  case,  since  nodes  are  neighbors  of  other 
loos  nodes.  For  instance  the  neighbors  of  a  node  with  a  high  clustering  coefficient 
woe  might  also  have  high  clustering  coefficient,  since  the  neighbors  are  also  in 

1607  the  same  cluster.  However,  we  are  working  here  with  differences  and  dif- 

1608  ferences  reduce  or  eliminate  these  positive  interaction  effects.  Hence,  in  our 
leoo  case  dependence  among  treatments  should  be  weak.  Nevertheless,  if  there  is 
i6io  dependence  among  treatments,  the  results  of  the  F-ratio  test  are  optimistic 
ion  (Winer,  1971),  meaning  that  more  treatments  are  accepted  as  influential.  In 

1612  our  case,  it  means  that  the  test  never  rejects  a  true  influential  effect,  while 

1613  non-influential  treatments  will  be  rejected  by  the  subsequent  FDR  tests.  The 
lew  only  purpose  of  this  screening  test  is  to  reduce  the  number  of  non-interesting 
1615  hypotheses  to  test  using  FDR  error  control,  and  as  we  have  seen  here,  this 
lew  test  does  just  that  despite  its  simplicity  and  assumptions. 
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1617  The  single  effects  F-ratio  screening  is  performed  here  controlling  the  error 
lew  rate  at  q=  0.15  at  the  global  and  node  level  in  order  to  avoid  overly  reducing 
lew  the  number  of  hypotheses  to  be  tested,  and  a  0.05  level  of  significance  at  the 

1620  node-to-node  level,  when  thousands  of  hypotheses  are  present. 

1621  Regression  Analysis 

1622  We  tested  the  statistical  significance  of  different  linear  regression  models 

1623  including  the  variables  sex  (coded  as  -1  men,  +1  women),  brain  volumes,26 

1624  age,  and  different  degrees  of  interactions,  in  modeling  the  probability  of  con- 

1625  nection  on  the  whole  data  set.  We  found  that  the  following  model  has  sta- 

1626  tistical  significance  modeling  the  connectivity  matrices,  on  average, 

y  —  A)  +  PiS  +  /?2  B  +  f33  A  +  P4SB,  (17) 

1627  where  predictors  S,B,A  represents  sex,  brain  volume,  and  age  respectively, 

1628  while  SB  represents  the  interaction  between  sex  and  brain  volume.  Given 

1629  the  strong  correlation  between  sex  and  brain  size,  we  employed  ridge  regres- 

1630  sion  that  provides  regularization  when  there  is  strong  collinearity  between 

1631  predictors.  The  used  Matlab  implementation  of  ridge  regression  also  centers 

1632  and  standardize  the  predictors  internally,  which  improves  stability  and  allow 

1633  for  proper  comparison  of  the  regression  coefficients. 

1634  Using  the  normalization  provided  by  Equation  (3),  the  regression  coeffi- 

1635  cients  were  fi\  =  6.15  x  10“3,  /32  =  —1.87  x  10-5,  f33  =  —2.12  x  10~4,  = 

1636  —6.23  x  10-3.  Where  we  can  see  that  the  effect  of  sex  is  about  328  times 

26The  brain  volume  was  calculated  from  the  manually  skull-stripped  images  in  mm 3 
and  then  converted  to  liters. 
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1637  larger  than  that  of  brain  size  and  about  30  times  larger  than  that  of  age. 

1638  However,  there  is  still  strong  negative  interaction  due  to  brain  size. 

1639  We  perform  an  F-test  of  significance  of  the  regression  model  using  the 
wo  un-centered  and  un-standardized  predictors.  We  found  that  we  can  reject 

1641  the  null  hypothesis  that  all  regression  coefficients  in  the  model  are  zero,  with 

1642  a  level  of  significance  of  0.002.  Now,  testing  the  significance  of  each  fac- 

1643  tor  (using  standard  t-test),  we  found  that  the  sex  and  age  coefficients  are 

1644  statistically  significant  with  a>  level  of  significance  of  2.8  x  10”4  and  0.048, 

1645  respectively,  but  the  brain  volume  coefficient  and  interaction  term  are  not  sta- 

1646  tistically  significant.  Given  that  the  effect  of  age  and  interaction  with  brain 

1647  volume  are  both  negative  and  much  lower  than  the  effect  of  sex,  we  disregard 
we  those  effects  in  the  analysis.  The  effect  of  age  and  brain  size  (through  inter- 

1649  action)  causes  a  reduction  in  the  statistical  power  of  the  analysis  performed 

1650  (since  their  effect  is  negative),  which  means  that  some  brain  connectivity  dif- 

1651  ferences  due  to  sex  that  might  have  been  influential  could  not  been  detected. 

1652  This  is  a  small  price  to  pay  in  exchange  for  simplicity  in  the  analysis  and 

1653  proves  the  importance  of  the  normalization  chosen. 

1654  The  regression  coefficients  for  the  centered  and  standardized  predictors 

1655  using  the  normalization  provided  by  Equation  (1)  were  fa  =  1.52  x  10~3,  fa  = 

1656  7.93  x  10”4,  fa  =  2.07  x  10”4,  fa  =  —8.9  x  10”3,  which  means  that  the  sex 

1657  effect  is  about  2  times  larger  than  that  of  brain  size,  7  times  larger  than 

1658  that  of  age,  and  about  2  times  the  interaction  with  brain  size.  Formally, 

1659  the  model  is  statistically  significant,  with  a  significance  level  of  7.5  x  10”4, 
i860  and  the  t-test  on  each  factor  reveals  that  the  coefficients  of  brain  size  and 
i66i  age  are  statistically  significant  with  a  significance  level  of  1.5  x  10”'  and 
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1662  0.035,  respectively,  while  the  sex  coefficient  is  only  statistically  significant  at 

1663  a  significance  level  of  0.18.  This  means  that  the  brain  volume  and  age  are 

1664  more  significant  than  sex  differences  and  hence  any  differences  found  using 
lees  this  normalization  alone  (without  further  processing)  could  be  false. 

wee  The  regression  coefficients  for  the  centered  and  standardized  predictors 
lee?  using  the  normalization  provided  by  Equation  (2)  were  f3\  =  7.58  x  10”3,  fa  = 
lees  4.49  xl0~5,  fa  =  3.7xl0”4,  fa  =  — 7.6xl0”3,  which  means  that  the  sex  effect 
lees  is  about  170  times  larger  than  that  of  brain  size,  20  times  larger  than  that  of 
wo  age,  and  there  is  strong  interaction  with  brain  size.  Formally,  the  model  is 

1671  statistically  significant,  with  a  significance  level  of  0.05,  and  the  t-test  on  each 

1672  factor  reveals  that  the  regression  coefficients  of  sex  and  age  are  statistically 

1673  significant  with  a  significance  level  of  0.007  and  0.046,  respectively,  while 

1674  brain  size  and  its  interaction  with  sex  are  not  statistically  significant.  As  can 

1675  be  seen  this  normalization  is  almost  as  good  as  Equation  (3),  but  we  preferred 
we  Equation  (3),  since  it  is  also  superior  in  terms  of  classification  performance 
1677  (see  Section  3.1)  and  holds  the  interpretation  described  above. 
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Table  1:  Sex  classification  performance  (see  Section  3.1)  obtained  from  the  connectiv¬ 
ity  matrix  (node-to-node  level).  We  observe  significantly  improved  results  when  feature 
selection  is  incorporated. 


Test 

All  features 

Feature  selection 

(2763) 

(297) 

Classification  accuracy  (%) 

49.5 

93.0 

Sensitivity  (%) 

56.5 

95.5 

Specificity  (%) 

37.3 

88.5 

Balanced  error  rate  (BER) 

0.5313 

0.0797 

Area  under  the  ROC  curve 

0.473 

0.9203 

Kappa  statistic 

-0.067 

0.8470 

p- value 

- 

0.001 

Table  2:  Kinship  classification  performance  (see  Section  3.1)  obtained  from  the  connec¬ 
tivity  matrix  (node-to-node  level). 


Test 

All  features 

Feature  selection 

(2763) 

(250) 

Accuracy  (%) 

63.49 

88.5  (0.010) 

Sensitivity  Identical  Twins  (%) 

28.0 

80.4 

Specificity  Identical  Twins  (%) 

88.2 

94.5 

Sensitivity  non-identical  Twins  (%) 

46.8 

86.2 

Specificity  non-identical  Twins  (%) 

77.8 

96.0 

Sensitivity  Siblings  (%) 

28.6 

72.2 

Specificity  Siblings  (%) 

92.5 

97.4 

Sensitivity  Unrelated  People  (%) 

100.0 

99.9 

Specificity  Unrelated  People  (%) 

88.3 

96.9 

BER 

0.3671 

0.1535  (0.016) 

ROC  area 

0.759 

0.904  (0.01) 

Kappa 

0.4796 

0.838  (0.017) 

p- value 

- 

0.001(0) 
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Table  3:  Sex  classification  performance  (see  Section  3.1)  using  the  clustering  coefficient 
(node  level). 


Test 

All  features 

Feature  selection 

(70) 

(53) 

Classification  accuracy  (%) 

55.4 

62.7 

Sensitivity  (%) 

64.8 

89.6 

Specificity  (%) 

37.0 

25.2 

Balanced  error  rate  (BER) 

0.4983 

0.4261 

Area  under  the  ROC  curve 

0.502 

0.7309 

Kappa  statistic 

0.0035 

0.5214 

p- value 

- 

0.001 

Table  4:  Sex  classification  performance  (see  Section  3.1)  using  the  generalized  communi¬ 
cability  matrix  (node-to-node  level). 


Test 

All  features 

FDR  thresholding 

Feature  selection 

(4900) 

(935) 

(298) 

Accuracy  (%) 

51.8 

46.2 

92.2 

Sensitivity  (%) 

58.0 

45.1 

93.7 

Specificity  (%) 

26.4 

30.9 

89.6 

BER 

0.5268 

0.5780 

0.0835 

ROC  area 

0.473 

0.429 

0.917 

Kappa 

-0.054 

-0.139 

0.832 

p-val 

- 

- 

0.001 
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Table  5:  Kinship  classification  performance  (see  Section  3.1)  using  edge  betweenness  cen¬ 
trality  (node-to-node  level). 


Test 

All  features 

FDR  thresholding 

Feature  selection 

(2388) 

(1031) 

(251) 

Accuracy  (%) 

57.1 

32.14 

87.3 

Sensitivity  Identical  Twins  (%) 

22.0 

16.0 

76.4 

Specificity  Identical  Twins  (%) 

84.7 

85.6 

97.0 

Sensitivity  non-identical  Twins  (%) 

40.3 

31.3 

86.7 

Specificity  non-identical  Twins  (%) 

82.2 

71.9 

92.0 

Sensitivity  Siblings  (%) 

25.7 

11.4 

70.9 

Specificity  Siblings  (%) 

91.2 

90.8 

97.5 

Sensitivity  Unrelated  People  (%) 

97.0 

48.0 

98.8 

Specificity  Unrelated  People  (%) 

83.6 

53.9 

96.1 

BER 

0.5636 

0.8870 

0.1677 

ROC  area 

0.708 

0.511 

0.8945 

Kappa 

0.3843 

0.0234 

0.820 

p-val 

- 

- 

0.001 

Table  6:  Global  topological  metrics  comparing  brain  connectivity  with  random  networks. 


Metric 

Human  Brain 

Random 

Z-score 

7 

2.84  (1.44) 

- 

- 

Clustering  Coefficient 

0.0766  (0.0130) 

0.0148  (0.0019) 

13.6 

Characteristic  Path 

77.50  (18.9) 

77.5  (18.9) 

0 

Node  Betweeness 

155.17  (12) 

147.64  (8.72) 

0.51 

Modularity 

0.7029  (0.0195) 

0.3380  (0.0187) 

13.51 

Rentian  Scale 

0.6958  (0.0394) 

0.7957  (0.031) 

2.0 

PageRank 

0.0143  (0.0096) 

0.0143  (0.084) 

0 

Estrada  Index 

73.1  (0.87) 

71.78  (0.55) 

1.28 

Triangular  motif  9 

3.8680  (0.7077) 

0.589  (0.173) 

4.50 

Triangular  motif  13 

1.8591  (0.4685) 

0.042  (0.0253) 

3.87 
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Figure  1:  Hierarchy  of  multiple  families  of  hypothesis  testing 
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a) 


b) 


Figure  2:  Selected  features  on  the  connectivity  matrix  for  a)  Sex  and  b)  Kinship  classifi¬ 
cation. 


a)  b) 


Figure  3:  a)  Selected  features  on  the  communicability  matrix  for  sex  classification,  b) 
Selected  features  on  the  edge  betweenness  centrality  matrix  for  kinship  classification.  Color 
code  corresponds  to  the  score  given  by  the  feature  selection  algorithm. 
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Figure  4:  Z-score  sex  differences  from  the  connectivity  matrix.  The  color  map  indicates 
where  the  probability  of  connection  is  higher  for  women  (magenta)  or  for  men  (cyan). 
Color  code  corresponds  to  the  score  given  by  the  feature  selection  algorithm. 
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Figure  5:  Z-score  Kinship  differences  using  the  connectivity  matrix,  a)  Identical  twins  vs 
non-identical  multiples,  b)  identical  twins  vs  siblings,  c)  identical  twins  vs  unrelated,  d) 
non-identical  multiples  vs  siblings,  e)  non-identical  multiples  vs  unrelated,  and  f)  siblings 
vs  unrelated.  The  color  map  indicates  where  the  differences  are  higher  for  the  first  group 
(magenta)  or  for  the  second  (cyan). 
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Nodes  Eigenvalues 


Figure  6:  Sex  differences  considering  a)  the  clustering  coefficient,  b)  the  communicability 
eigenvalues. 
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Figure  7:  Sex  differences  considering  a)  the  edge  betweenness  centrality,  b)  the  communi¬ 
cability  matrix. 
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Figure  8:  Z-score  kinship  differences  considering  the  communicability  eigenvalues:  a)  Iden¬ 
tical  twins  vs  non-identical  multiples,  b)  identical  twins  vs  siblings,  c)  identical  twins  vs 
unrelated,  d)  non-identical  multiples  vs  siblings,  e)  non-identical  multiples  vs  unrelated, 
and  f)  siblings  vs  unrelated. 
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Figure  9:  Z-score  kinship  differences  considering  edge  betweenness  centrality:  a)  Identical 
twins  vs  non-identical  multiples,  b)  identical  twins  vs  siblings,  c)  identical  twins  vs  unre¬ 
lated,  d)  non-identical  multiples  vs  siblings,  e)  non-identical  multiples  vs  unrelated,  and 
f)  siblings  vs  unrelated. 
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Table  1:  Cortical  labels.  Labels  1  (left)  and  36  (right)  were  reserved  for  non-cortical 
surfaces. 


Left  hemisphere 

Right  hemisphere 

Region 

2 

37 

Caudal  anterior  cingulate 

3 

38 

Caudal  middle  frontal 

4 

39 

Corpus  callosum 

5 

40 

Cuneus 

6 

41 

Entorhinal 

7 

42 

Fusiform 

8 

43 

Inferior  parietal 

9 

44 

Inferior  temporal 

10 

45 

Isthmus  of  the  cingulate 

11 

46 

Lateral  occipital 

12 

47 

Lateral  orbitofrontal 

13 

48 

Lingual 

14 

49 

Medial  orbitofrontal 

15 

50 

Middle  temporal 

16 

51 

Parahippocampal 

17 

52 

Paracentral 

18 

53 

Pars  opercularis 

19 

54 

Pars  orbitalis 

20 

55 

Pars  triangularis 

21 

56 

Peri-calcarine 

22 

57 

Postcentral 

23 

58 

Posterior  cingulate 

24 

59 

Pre-central 

25 

60 

Precuneus 

26 

61 

Rostral  anterior  cingulate 

27 

62 

Rostral  middle  frontal 

28 

63 

Superior  frontal 

29 

64 

Superior  parietal 

30 

65 

Superior  temporal 

31 

66 

Supra-marginal 

32 

67 

Frontal  pole 

33 

68 

Temporal  pole 

34 

69 

Transverse  temporal 

35 

70 

Insula 

2 


Table  2:  Classification  performance  (see  Section  4.1)  using  the  “raw”  connectivity  matrices 
and  different  normalizations. 


Test 

Equation  (1) 

Equation  (2) 

Equation  (3) 

SEX 

Accuracy  (%) 

88.1 

89.9 

93.0 

Sensitivity  (%) 

92.3 

93.8 

95.5 

Specificity  (%) 

80.8 

83.1 

88.5 

BER 

0.1345 

0.1156 

0.0797 

ROC  area 

0.8655 

0.8844 

0.9203 

Kappa 

0.7397 

0.7788 

0.8470 

p-val 

0.001 

0.001 

0.001 

KINSHIP 

Accuracy  (%) 

89.7 

87.4 

88.5 

Sensitivity  Identical  Twins  (%) 

87.6 

72.0 

80.4 

Specificity  Identical  Twins  (%) 

95.4 

94.1 

94.5 

Sensitivity  non-identical  Twins  (%) 

83.9 

82.2 

86.1 

Specificity  non-identical  Twins  (%) 

95.9 

93.4 

96.1 

Sensitivity  Siblings  (%) 

74.7 

83.1 

72.3 

Specificity  Siblings  (%) 

96.9 

97.7 

97.3 

Sensitivity  Unrelated  People  (%) 

99.9 

100.0 

99.9 

Specificity  Unrelated  People  (%) 

98.5 

98.2 

96.92 

BER 

0.1346 

0.1568 

0.1535 

ROC  area 

0.9161 

0.9009 

0.9040 

Kappa 

0.8556 

0.8222 

0.8380 

p-val 

0.001 

0.001 

0.001 
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Table  3:  Classification  performance  (see  Section  4.1)  using  the  generalized  communicability 
matrix  and  different  normalizations. 


Test 

Equation  (1) 

Equation  (2) 

Equation  (3) 

SEX 

Accuracy  (%) 

88.1 

88.3 

92.2 

Sensitivity  (%) 

91.4 

90.3 

93.7 

Specificity  (%) 

82.4 

84.7 

89.6 

BER 

0.1311 

0.1247 

0.0835 

ROC  area 

0.8689 

0.8753 

0.9165 

Kappa 

0.7417 

0.7475 

0.8320 

p-val 

0.001 

0.001 

0.001 

KINSHIP 

Accuracy  (%) 

85.8 

83.6 

86.7 

Sensitivity  Identical  Twins  (%) 

74.8 

67.8 

71.7 

Specificity  Identical  Twins  (%) 

94.3 

93.6 

94.7 

Sensitivity  non-identical  Twins  (%) 

83.1 

72.6 

85.2 

Specificity  non-identical  Twins  (%) 

92.3 

90.3 

94.4 

Sensitivity  Siblings  (%) 

66.6 

82.5 

74.0 

Specificity  Siblings  (%) 

96.7 

97.4 

97.4 

Sensitivity  Unrelated  People  (%) 

99.9 

99.2 

99.7 

Specificity  Unrelated  People  (%) 

98.0 

96.8 

95.5 

BER 

0.1891 

0.1950 

0.1735 

ROC  area 

0.8821 

0.8750 

0.8908 

Kappa 

0.8000 

0.7684 

0.8121 

p-val 

0.001 

0.001 

0.001 
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Table  4:  Classification  performance  (see  Section  4.1)  using  Edge  Betweenness  Centrality 
and  different  normalizations. 


Test 

Equation  (1) 

Equation  (2) 

Equation  (3) 

SEX 

Accuracy  (%) 

93.7 

80.7 

92.5 

Sensitivity  (%) 

96.4 

87.2 

97.1 

Specificity  (%) 

89.1 

69.2 

84.5 

BER 

0.0727 

0.2178 

0.0923 

ROC  area 

0.927 

0.7822 

0.9077 

Kappa 

0.8631 

0.5748 

0.8341 

p-val 

0.001 

0.001 

0.001 

KINSHIP 

Accuracy  (%) 

75.2 

75.5 

87.3 

Sensitivity  Identical  Twins  (%) 

53.0 

56.4 

76.4 

Specificity  Identical  Twins  (%) 

91.8 

91.4 

97.0 

Sensitivity  non-identical  Twins  (%) 

74.0 

72.9 

86.7 

Specificity  non-identical  Twins  (%) 

89.6 

90.5 

92.0 

Sensitivity  Siblings  (%) 

54.0 

45.9 

70.9 

Specificity  Siblings  (%) 

95.9 

94.8 

97.5 

Sensitivity  Unrelated  People  (%) 

94.4 

97.3 

98.8 

Specificity  Unrelated  People  (%) 

88.3 

89.8 

96.1 

BER 

0.3113 

0.3190 

0.1677 

ROC  area 

0.8013 

0.7987 

0.8945 

Kappa 

0.6460 

0.6512 

0.8201 

p-val 

0.001 

0.001 

0.001 
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Table  5:  Sex  confusion  matrices  when  classifying  directly  from  the  connectivity  matrix. 


All  features 

Women 

Men 

Women 

109 

84 

Men 

69 

41 

Feature  selection 

Women 

Men 

Women 

184.4 

8.6 

Men 

12.6 

97.4 

Table  6:  Kinship  confusion  matrices  when  classifying  directly  from  the  connectivity  matrix. 


All  features 

Identical  Twins 

Non-identical  Multiples 

Siblings 

Unrelated 

Identical  Twins 

14 

26 

5 

5 

Non-identical  Multiples 

16 

36 

12 

13 

Siblings 

9 

15 

10 

1 

Unrelated 

0 

0 

0 

100 

Feature  selection 

Identical  Twins 

Non-identical  Multiples 

Siblings 

Unrelated 

Identical  Twins 

40.2 

4.1 

3.7 

2.0 

Non-identical  Multiples 

4.8 

57.7 

2.1 

2.4 

Siblings 

6.3 

3.1 

25.3 

0.3 

Unrelated 

0.0 

0.1 

0.0 

99.9 
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Table  7:  Connectivity  best  features  for  Sex  classification. 


Region  1 

Region  2 

Corpus  callosum  (L) 

Noncortical  (L) 

Inferior  temporal  (L) 

Noncortical  (L) 

Isthmus  of  the  cingulate  (L) 

Noncortical  (L) 

Lingual  (L) 

Noncortical  (L) 

Superior  temporal  (L) 

Noncortical  (L) 

Insula  (L) 

Noncortical  (L) 

Precuneus  (L) 

Caudal  anterior  cingulate  (L) 

Medial  orbitofrontal  (R) 

Caudal  anterior  cingulate  (L) 

Isthmus  of  the  cingulate  (L) 

Caudal  middle  frontal  (L) 

Inferior  temporal  (L) 

Corpus  callosum  (L) 

Lateral  occipital  (L) 

Corpus  callosum  (L) 

Pars  orbitalis  (L) 

Corpus  callosum  (L) 

Posterior  cingulate  (L) 

Corpus  callosum  (L) 

Frontal  pole  (L) 

Corpus  callosum  (L) 

Lateral  orbitofrontal  (R) 

Corpus  callosum  (L) 

Lingual  (R) 

Corpus  callosum  (L) 

Peri-calcarine  (R) 

Corpus  callosum  (L) 

Frontal  pole  (R) 

Corpus  callosum  (L) 

Superior  temporal  (L) 

Cuneus  (L) 

Isthmus  of  the  cingulate  (R) 

Cuneus  (L) 

Lingual  (L) 

Entorhinal  (L) 

Parahippocampal  (L) 

Entorhinal  (L) 

Fusiform  (L) 

Inferior  parietal  (L) 

Lingual  (L) 

Inferior  parietal  (L) 

Corpus  callosum  (L) 

Inferior  temporal  (L) 

Inferior  parietal  (L) 

Inferior  temporal  (L) 

Inferior  temporal  (L) 

Inferior  temporal  (L) 

Medial  orbitofrontal  (L) 

Inferior  temporal  (L) 

Superior  temporal  (L) 

Inferior  temporal  (L) 

Caudal  anterior  cingulate  (L) 

Isthmus  of  the  cingulate  (L) 

Caudal  middle  frontal  (L) 

Isthmus  of  the  cingulate  (L) 

Parahippocampal  (L) 

Isthmus  of  the  cingulate  (L) 

Cuneus  (R) 

Isthmus  of  the  cingulate  (L) 

Peri-calcarine  (R) 

Isthmus  of  the  cingulate  (L) 

Corpus  callosum  (L) 

Lateral  occipital  (L) 

Middle  temporal  (L) 

Lateral  occipital  (L) 

Superior  parietal  (L) 

Lateral  occipital  (L) 

Superior  temporal  (L) 

Lateral  occipital  (L) 

Cuneus  (R) 

Lateral  occipital  (L) 

Lingual  (R) 

Lateral  occipital  (L) 

Pars  orbitalis  (L) 

Lateral  orbitofrontal  (L) 

Pars  triangularis  (L) 

Lateral  orbitofrontal  (L) 

Pre-central  (L) 

Lateral  orbitofrontal  (L) 

Frontal  pole  (L) 

Lateral  orbitofrontal  (L) 

Noncortical  (L) 

Lingual  (L) 

Cuneus  (L) 

Lingual  (L) 

Inferior  temporal  (L) 

Medial  orbitofrontal  (L) 

Superior  temporal  (L) 

Medial  orbitofrontal  (L) 

Caudal  anterior  cingulate  (R) 

Medial  orbitofrontal  (L) 

Region  1 

Region  2 

Lingual  (R) 

Noncortical  (R) 

Pars  opercularis  (R) 

Noncortical  (R) 

Caudal  middle  frontal  (R) 

Caudal  middle  frontal  (R) 

Pre-central  (R) 

Caudal  middle  frontal  (R) 

Rostral  anterior  cingulate  (R) 

Caudal  middle  frontal  (R) 

Caudal  middle  frontal  (L) 

Corpus  callosum  (R) 

Caudal  anterior  cingulate  (R) 

Corpus  callosum  (R) 

Rostral  anterior  cingulate  (R) 

Corpus  callosum  (R) 

Insula  (R) 

Corpus  callosum  (R) 

Lateral  occipital  (L) 

Cuneus  (R) 

Lingual  (L) 

Cuneus  (R) 

Isthmus  of  the  cingulate  (R) 

Cuneus  (R) 

Superior  temporal  (R) 

Cuneus  (R) 

Precuneus  (L) 

Fusiform  (R) 

Inferior  parietal  (R) 

Fusiform  (R) 

Isthmus  of  the  cingulate  (R) 

Fusiform  (R) 

Precuneus  (R) 

Fusiform  (R) 

Rostral  middle  frontal  (R) 

Fusiform  (R) 

Supra-marginal  (R) 

Fusiform  (R) 

Paracentral  (R) 

Inferior  parietal  (R) 

Pars  opercularis  (R) 

Inferior  parietal  (R) 

Entorhinal  (R) 

Inferior  temporal  (R) 

Caudal  anterior  cingulate  (R) 

Isthmus  of  the  cingulate  (R) 

Corpus  callosum  (R) 

Isthmus  of  the  cingulate  (R) 

Cuneus  (R) 

Isthmus  of  the  cingulate  (R) 

Superior  frontal  (R) 

Isthmus  of  the  cingulate  (R) 

Fusiform  (R) 

Lateral  occipital  (R) 

Superior  parietal  (R) 

Lateral  occipital  (R) 

Caudal  anterior  cingulate  (L) 

Lateral  orbitofrontal  (R) 

Medial  orbitofrontal  (L) 

Lateral  orbitofrontal  (R) 

Superior  frontal  (L) 

Lateral  orbitofrontal  (R) 

Caudal  middle  frontal  (R) 

Lateral  orbitofrontal  (R) 

Corpus  callosum  (R) 

Lateral  orbitofrontal  (R) 

Parahippocampal  (R) 

Lateral  orbitofrontal  (R) 

Isthmus  of  the  cingulate  (L) 

Lingual  (R) 

Lingual  (L) 

Lingual  (R) 

Parahippocampal  (L) 

Lingual  (R) 

Superior  frontal  (L) 

Lingual  (R) 

Superior  parietal  (L) 

Lingual  (R) 

Corpus  callosum  (R) 

Lingual  (R) 

Paracentral  (R) 

Lingual  (R) 

Caudal  anterior  cingulate  (R) 

Medial  orbitofrontal  (R) 

Middle  temporal  (R) 

Medial  orbitofrontal  (R) 

Pars  orbitalis  (R) 

Medial  orbitofrontal  (R) 

Pre-central  (R) 

Medial  orbitofrontal  (R) 

Inferior  parietal  (R) 

Middle  temporal  (R) 

Isthmus  of  the  cingulate  (R) 

Middle  temporal  (R) 

Medial  orbitofrontal  (R) 

Middle  temporal  (R) 

Precuneus  (R) 

Middle  temporal  (R) 
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Table  7  —  continued  from  previous  page 


Region  1 

Region  2 

Posterior  cingulate  (R) 

Medial  orbitofrontal  (L) 

Peri-calcarine  (L) 

Middle  temporal  (L) 

Transverse  temporal  (L) 

Middle  temporal  (L) 

Cuneus  (L) 

Parahippocampal  (L) 

Entorhinal  (L) 

Parahippocampal  (L) 

Fusiform  (L) 

Parahippocampal  (L) 

Inferior  parietal  (L) 

Parahippocampal  (L) 

Peri-calcarine  (L) 

Parahippocampal  (L) 

Superior  temporal  (L) 

Parahippocampal  (L) 

Temporal  pole  (L) 

Parahippocampal  (L) 

Transverse  temporal  (L) 

Parahippocampal  (L) 

Insula  (L) 

Parahippocampal  (L) 

Lingual  (R) 

Parahippocampal  (L) 

Postcentral  (L) 

Paracentral  (L) 

Posterior  cingulate  (L) 

Paracentral  (L) 

Superior  parietal  (L) 

Paracentral  (L) 

Paracentral  (R) 

Paracentral  (L) 

Posterior  cingulate  (R) 

Paracentral  (L) 

Precuneus  (R) 

Paracentral  (L) 

Postcentral  (L) 

Pars  opercularis  (L) 

Superior  temporal  (L) 

Pars  opercularis  (L) 

Pars  opercularis  (L) 

Pars  orbitalis  (L) 

Pars  triangularis  (L) 

Pars  orbitalis  (L) 

Rostral  middle  frontal  (L) 

Pars  triangularis  (L) 

Rostral  anterior  cingulate  (R) 

Pars  triangularis  (L) 

Supra-marginal  (L) 

Peri-calcarine  (L) 

Transverse  temporal  (L) 

Peri-calcarine  (L) 

Lingual  (R) 

Peri-calcarine  (L) 

Peri-calcarine  (R) 

Peri-calcarine  (L) 

Posterior  cingulate  (R) 

Peri-calcarine  (L) 

Precuneus  (R) 

Peri-calcarine  (L) 

Noncortical  (L) 

Postcentral  (L) 

Paracentral  (L) 

Postcentral  (L) 

Postcentral  (L) 

Postcentral  (L) 

Transverse  temporal  (L) 

Postcentral  (L) 

Lingual  (L) 

Posterior  cingulate  (L) 

Medial  orbitofrontal  (L) 

Posterior  cingulate  (L) 

Caudal  anterior  cingulate  (L) 

Pre-central  (L) 

Parahippocampal  (L) 

Pre-central  (L) 

Posterior  cingulate  (L) 

Pre-central  (L) 

Precuneus  (L) 

Pre-central  (L) 

Superior  temporal  (L) 

Pre-central  (L) 

Supra-marginal  (L) 

Pre-central  (L) 

Caudal  anterior  cingulate  (R) 

Pre-central  (L) 

Corpus  callosum  (R) 

Pre-central  (L) 

Posterior  cingulate  (R) 

Pre-central  (L) 

Superior  parietal  (R) 

Pre-central  (L) 

Caudal  anterior  cingulate  (L) 

Precuneus  (L) 

Cuneus  (L) 

Precuneus  (L) 

Fusiform  (L) 

Precuneus  (L) 

Region  1 

Region  2 

Superior  temporal  (R) 

Middle  temporal  (R) 

Entorhinal  (R) 

Parahippocampal  (R) 

Middle  temporal  (R) 

Parahippocampal  (R) 

Peri-calcarine  (R) 

Parahippocampal  (R) 

Precuneus  (R) 

Parahippocampal  (R) 

Temporal  pole  (R) 

Parahippocampal  (R) 

Insula  (R) 

Parahippocampal  (R) 

Inferior  parietal  (R) 

Paracentral  (R) 

Lingual  (R) 

Paracentral  (R) 

Postcentral  (R) 

Paracentral  (R) 

Posterior  cingulate  (R) 

Paracentral  (R) 

Noncortical  (R) 

Pars  opercularis  (R) 

Lateral  orbitofrontal  (R) 

Pars  opercularis  (R) 

Rostral  middle  frontal  (R) 

Pars  opercularis  (R) 

Superior  parietal  (R) 

Pars  opercularis  (R) 

Insula  (R) 

Pars  opercularis  (R) 

Corpus  callosum  (L) 

Pars  orbitalis  (R) 

Posterior  cingulate  (L) 

Pars  orbitalis  (R) 

Rostral  anterior  cingulate  (L) 

Pars  orbitalis  (R) 

Corpus  callosum  (R) 

Pars  orbitalis  (R) 

Rostral  middle  frontal  (R) 

Pars  orbitalis  (R) 

Caudal  anterior  cingulate  (L) 

Pars  triangularis  (R) 

Pars  orbitalis  (R) 

Pars  triangularis  (R) 

Corpus  callosum  (L) 

Peri-calcarine  (R) 

Lateral  occipital  (L) 

Peri-calcarine  (R) 

Peri-calcarine  (L) 

Peri-calcarine  (R) 

Noncortical  (R) 

Peri-calcarine  (R) 

Corpus  callosum  (R) 

Peri-calcarine  (R) 

Fusiform  (R) 

Peri-calcarine  (R) 

Superior  parietal  (R) 

Peri-calcarine  (R) 

Paracentral  (R) 

Postcentral  (R) 

Supra-marginal  (R) 

Postcentral  (R) 

Insula  (R) 

Postcentral  (R) 

Cuneus  (L) 

Posterior  cingulate  (R) 

Medial  orbitofrontal  (L) 

Posterior  cingulate  (R) 

Paracentral  (L) 

Posterior  cingulate  (R) 

Peri-calcarine  (L) 

Posterior  cingulate  (R) 

Pre-central  (L) 

Posterior  cingulate  (R) 

Lateral  orbitofrontal  (R) 

Posterior  cingulate  (R) 

Peri-calcarine  (R) 

Posterior  cingulate  (R) 

Lateral  orbitofrontal  (R) 

Pre-central  (R) 

Pars  opercularis  (R) 

Pre-central  (R) 

Caudal  anterior  cingulate  (L) 

Precuneus  (R) 

Inferior  temporal  (L) 

Precuneus  (R) 

Fusiform  (R) 

Precuneus  (R) 

Inferior  temporal  (R) 

Precuneus  (R) 

Middle  temporal  (R) 

Precuneus  (R) 

Caudal  anterior  cingulate  (L) 

Rostral  anterior  cingulate  (R) 

Pre-central  (R) 

Rostral  anterior  cingulate  (R) 

Caudal  anterior  cingulate  (L) 

Rostral  middle  frontal  (R) 

Table  7  —  continued  from  previous  page 


Region  1 

Region  2 

Pre-central  (L) 

Rostral  middle  frontal  (R) 

Rostral  anterior  cingulate  (L) 

Rostral  middle  frontal  (R) 

Pars  opercularis  (R) 

Rostral  middle  frontal  (R) 

Pars  orbitalis  (R) 

Rostral  middle  frontal  (R) 

Rostral  anterior  cingulate  (L) 

Superior  frontal  (R) 

Isthmus  of  the  cingulate  (R) 

Superior  frontal  (R) 

Lateral  orbitofrontal  (R) 

Superior  frontal  (R) 

Paracentral  (R) 

Superior  frontal  (R) 

Pars  triangularis  (R) 

Superior  frontal  (R) 

Posterior  cingulate  (R) 

Superior  frontal  (R) 

Frontal  pole  (R) 

Superior  frontal  (R) 

Insula  (R) 

Superior  frontal  (R) 

Posterior  cingulate  (L) 

Superior  parietal  (R) 

Caudal  anterior  cingulate  (R) 

Superior  parietal  (R) 

Corpus  callosum  (R) 

Superior  parietal  (R) 

Isthmus  of  the  cingulate  (R) 

Superior  parietal  (R) 

Pars  opercularis  (R) 

Superior  parietal  (R) 

Peri-calcarine  (R) 

Superior  parietal  (R) 

Postcentral  (R) 

Superior  parietal  (R) 

Transverse  temporal  (R) 

Superior  parietal  (R) 

Cuneus  (R) 

Superior  temporal  (R) 

Inferior  parietal  (R) 

Superior  temporal  (R) 

Isthmus  of  the  cingulate  (R) 

Superior  temporal  (R) 

Pars  triangularis  (R) 

Superior  temporal  (R) 

Peri-calcarine  (R) 

Superior  temporal  (R) 

Transverse  temporal  (R) 

Superior  temporal  (R) 

Isthmus  of  the  cingulate  (L) 

Supra-marginal  (R) 

Cuneus  (R) 

Supra-marginal  (R) 

Fusiform  (R) 

Supra-marginal  (R) 

Inferior  temporal  (R) 

Supra-marginal  (R) 

Lingual  (R) 

Supra-marginal  (R) 

Rostral  anterior  cingulate  (L) 

Frontal  pole  (R) 

Rostral  anterior  cingulate  (R) 

Frontal  pole  (R) 

Parahippocampal  (R) 

Temporal  pole  (R) 

Superior  temporal  (R) 

Temporal  pole  (R) 

Temporal  pole  (R) 

Temporal  pole  (R) 

Insula  (R) 

Temporal  pole  (R) 

Fusiform  (R) 

Transverse  temporal  (R) 

Middle  temporal  (R) 

Transverse  temporal  (R) 

Peri-calcarine  (R) 

Transverse  temporal  (R) 

Superior  temporal  (R) 

Transverse  temporal  (R) 

Caudal  anterior  cingulate  (L) 

Insula  (R) 

Superior  frontal  (L) 

Insula  (R) 

Corpus  callosum  (R) 

Insula  (R) 

Parahippocampal  (R) 

Insula  (R) 

Pars  triangularis  (R) 

Insula  (R) 

Superior  frontal  (R) 

Insula  (R) 

Supra-marginal  (R) 

Insula  (R) 

Transverse  temporal  (R) 

Insula  (R) 

Region  1 

Region  2 

Pars  opercularis  (L) 

Precuneus  (L) 

Posterior  cingulate  (L) 

Precuneus  (L) 

Transverse  temporal  (L) 

Precuneus  (L) 

Insula  (L) 

Precuneus  (L) 

Pre-central  (R) 

Precuneus  (L) 

Pars  orbitalis  (L) 

Rostral  anterior  cingulate  (L) 

Superior  temporal  (L) 

Rostral  anterior  cingulate  (L) 

Insula  (L) 

Rostral  anterior  cingulate  (L) 

Caudal  middle  frontal  (R) 

Rostral  anterior  cingulate  (L) 

Caudal  middle  frontal  (L) 

Rostral  middle  frontal  (L) 

Medial  orbitofrontal  (L) 

Rostral  middle  frontal  (L) 

Pars  orbitalis  (L) 

Rostral  middle  frontal  (L) 

Rostral  anterior  cingulate  (L) 

Rostral  middle  frontal  (L) 

Superior  temporal  (L) 

Rostral  middle  frontal  (L) 

Supra-marginal  (L) 

Rostral  middle  frontal  (L) 

Isthmus  of  the  cingulate  (L) 

Superior  frontal  (L) 

Paracentral  (L) 

Superior  frontal  (L) 

Caudal  middle  frontal  (R) 

Superior  frontal  (L) 

Medial  orbitofrontal  (R) 

Superior  frontal  (L) 

Fusiform  (L) 

Superior  parietal  (L) 

Lateral  occipital  (L) 

Superior  parietal  (L) 

Postcentral  (L) 

Superior  parietal  (L) 

Posterior  cingulate  (L) 

Superior  parietal  (L) 

Insula  (L) 

Superior  parietal  (L) 

Isthmus  of  the  cingulate  (R) 

Superior  parietal  (L) 

Paracentral  (R) 

Superior  parietal  (L) 

Corpus  callosum  (L) 

Superior  temporal  (L) 

Middle  temporal  (L) 

Superior  temporal  (L) 

Pars  triangularis  (L) 

Superior  temporal  (L) 

Pre-central  (L) 

Superior  temporal  (L) 

Rostral  middle  frontal  (L) 

Superior  temporal  (L) 

Supra-marginal  (L) 

Superior  temporal  (L) 

Inferior  parietal  (L) 

Supra-marginal  (L) 

Rostral  middle  frontal  (L) 

Supra-marginal  (L) 

Superior  frontal  (L) 

Supra-marginal  (L) 

Superior  parietal  (L) 

Supra-marginal  (L) 

Insula  (L) 

Supra-marginal  (L) 

Caudal  anterior  cingulate  (R) 

Frontal  pole  (L) 

Rostral  middle  frontal  (R) 

Frontal  pole  (L) 

Temporal  pole  (L) 

Temporal  pole  (L) 

Fusiform  (L) 

Transverse  temporal  (L) 

Lingual  (L) 

Transverse  temporal  (L) 

Middle  temporal  (L) 

Transverse  temporal  (L) 

Parahippocampal  (L) 

Transverse  temporal  (L) 

Postcentral  (L) 

Insula  (L) 

Precuneus  (L) 

Insula  (L) 

Superior  parietal  (L) 

Insula  (L) 

Temporal  pole  (L) 

Insula  (L) 

Precuneus  (L) 

Noncortical  (R) 

Inferior  parietal  (R) 

Noncortical  (R) 
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Table  8:  Connectivity  best  features  for  kinship  classification. 


Region  1 

Region  2 

Noncortical  (L) 

Noncortical  (L) 

Cuneus  (L) 

Noncortical  (L) 

Fusiform  (L) 

Noncortical  (L) 

Postcentral  (L) 

Noncortical  (L) 

Paracentral  (L) 

Caudal  anterior  cingulate  (L) 

Corpus  callosum  (R) 

Caudal  anterior  cingulate  (L) 

Paracentral  (R) 

Caudal  anterior  cingulate  (L) 

Caudal  middle  frontal  (L) 

Caudal  middle  frontal  (L) 

Paracentral  (L) 

Caudal  middle  frontal  (L) 

Pars  opercularis  (L) 

Caudal  middle  frontal  (L) 

Corpus  callosum  (R) 

Caudal  middle  frontal  (L) 

Posterior  cingulate  (R) 

Caudal  middle  frontal  (L) 

Postcentral  (L) 

Corpus  callosum  (L) 

Superior  parietal  (L) 

Corpus  callosum  (L) 

Frontal  pole  (L) 

Corpus  callosum  (L) 

Frontal  pole  (R) 

Corpus  callosum  (L) 

Noncortical  (L) 

Cuneus  (L) 

Middle  temporal  (L) 

Cuneus  (L) 

Temporal  pole  (L) 

Entorhinal  (L) 

Noncortical  (L) 

Fusiform  (L) 

Lateral  occipital  (L) 

Fusiform  (L) 

Lingual  (L) 

Fusiform  (L) 

Temporal  pole  (L) 

Fusiform  (L) 

Noncortical  (L) 

Inferior  parietal  (L) 

Fusiform  (L) 

Inferior  parietal  (L) 

Isthmus  of  the  cingulate  (L) 

Inferior  parietal  (L) 

Lateral  occipital  (L) 

Inferior  parietal  (L) 

Lingual  (L) 

Inferior  parietal  (L) 

Postcentral  (L) 

Inferior  parietal  (L) 

Inferior  parietal  (L) 

Inferior  temporal  (L) 

Inferior  temporal  (L) 

Inferior  temporal  (L) 

Isthmus  of  the  cingulate  (L) 

Inferior  temporal  (L) 

Lateral  occipital  (L) 

Inferior  temporal  (L) 

Parahippocampal  (L) 

Inferior  temporal  (L) 

Temporal  pole  (L) 

Inferior  temporal  (L) 

Caudal  anterior  cingulate  (L) 

Isthmus  of  the  cingulate  (L) 

Postcentral  (L) 

Isthmus  of  the  cingulate  (L) 

Supra-marginal  (L) 

Isthmus  of  the  cingulate  (L) 

Caudal  anterior  cingulate  (R) 

Isthmus  of  the  cingulate  (L) 

Peri-calcarine  (R) 

Isthmus  of  the  cingulate  (L) 

Postcentral  (R) 

Isthmus  of  the  cingulate  (L) 

Inferior  parietal  (L) 

Lateral  occipital  (L) 

Supra-marginal  (L) 

Lateral  occipital  (L) 

Pars  orbitalis  (L) 

Lateral  orbitofrontal  (L) 

Frontal  pole  (L) 

Lateral  orbitofrontal  (L) 

Inferior  parietal  (L) 

Lingual  (L) 

Region  1 

Region  2 

Precuneus  (L) 

Noncortical  (R) 

Inferior  parietal  (R) 

Noncortical  (R) 

Lingual  (R) 

Noncortical  (R) 

Temporal  pole  (R) 

Noncortical  (R) 

Caudal  anterior  cingulate  (L) 

Caudal  middle  frontal  (R) 

Precuneus  (L) 

Caudal  middle  frontal  (R) 

Supra-marginal  (R) 

Caudal  middle  frontal  (R) 

Precuneus  (L) 

Corpus  callosum  (R) 

Posterior  cingulate  (R) 

Corpus  callosum  (R) 

Peri-calcarine  (L) 

Cuneus  (R) 

Lateral  occipital  (R) 

Cuneus  (R) 

Parahippocampal  (R) 

Entorhinal  (R) 

Precuneus  (L) 

Fusiform  (R) 

Entorhinal  (R) 

Fusiform  (R) 

Fusiform  (R) 

Fusiform  (R) 

Lateral  occipital  (R) 

Fusiform  (R) 

Precuneus  (R) 

Fusiform  (R) 

Noncortical  (R) 

Inferior  parietal  (R) 

Pars  triangularis  (R) 

Inferior  parietal  (R) 

Temporal  pole  (R) 

Inferior  parietal  (R) 

Entorhinal  (R) 

Inferior  temporal  (R) 

Temporal  pole  (R) 

Inferior  temporal  (R) 

Caudal  anterior  cingulate  (L) 

Isthmus  of  the  cingulate  (R) 

Lateral  occipital  (R) 

Isthmus  of  the  cingulate  (R) 

Isthmus  of  the  cingulate  (L) 

Lateral  occipital  (R) 

Isthmus  of  the  cingulate  (R) 

Lateral  occipital  (R) 

Middle  temporal  (R) 

Lateral  occipital  (R) 

Supra-marginal  (R) 

Lateral  occipital  (R) 

Caudal  middle  frontal  (R) 

Lateral  orbitofrontal  (R) 

Entorhinal  (R) 

Lateral  orbitofrontal  (R) 

Posterior  cingulate  (R) 

Lateral  orbitofrontal  (R) 

Cuneus  (R) 

Lingual  (R) 

Entorhinal  (R) 

Lingual  (R) 

Supra-marginal  (R) 

Lingual  (R) 

Corpus  callosum  (L) 

Medial  orbitofrontal  (R) 

Parahippocampal  (R) 

Medial  orbitofrontal  (R) 

Rostral  middle  frontal  (R) 

Medial  orbitofrontal  (R) 

Insula  (R) 

Medial  orbitofrontal  (R) 

Entorhinal  (R) 

Middle  temporal  (R) 

Inferior  parietal  (R) 

Middle  temporal  (R) 

Lateral  occipital  (R) 

Middle  temporal  (R) 

Parahippocampal  (R) 

Middle  temporal  (R) 

Insula  (R) 

Middle  temporal  (R) 

Isthmus  of  the  cingulate  (L) 

Parahippocampal  (R) 

Entorhinal  (R) 

Parahippocampal  (R) 

Lingual  (R) 

Parahippocampal  (R) 
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Table  8  —  continued  from  previous  page 


Region  1 

Region  2 

Inferior  temporal  (L) 

Medial  orbitofrontal  (L) 

Paracentral  (L) 

Medial  orbitofrontal  (L) 

Frontal  pole  (L) 

Medial  orbitofrontal  (L) 

Cuneus  (L) 

Middle  temporal  (L) 

Inferior  parietal  (L) 

Middle  temporal  (L) 

Lateral  occipital  (L) 

Middle  temporal  (L) 

Lateral  orbitofrontal  (L) 

Middle  temporal  (L) 

Noncortical  (L) 

Parahippocampal  (L) 

Corpus  callosum  (L) 

Parahippocampal  (L) 

Lateral  occipital  (L) 

Parahippocampal  (L) 

Corpus  callosum  (R) 

Parahippocampal  (L) 

Medial  orbitofrontal  (L) 

Paracentral  (L) 

Superior  parietal  (L) 

Paracentral  (L) 

Paracentral  (R) 

Paracentral  (L) 

Caudal  middle  frontal  (L) 

Pars  opercularis  (L) 

Superior  temporal  (L) 

Pars  opercularis  (L) 

Corpus  callosum  (R) 

Pars  opercularis  (L) 

Caudal  anterior  cingulate  (L) 

Pars  orbitalis  (L) 

Caudal  anterior  cingulate  (L) 

Pars  triangularis  (L) 

Pars  opercularis  (L) 

Pars  triangularis  (L) 

Pars  orbitalis  (L) 

Pars  triangularis  (L) 

Pars  triangularis  (L) 

Pars  triangularis  (L) 

Insula  (L) 

Pars  triangularis  (L) 

Transverse  temporal  (L) 

Peri-calcarine  (L) 

Cuneus  (R) 

Peri-calcarine  (L) 

Posterior  cingulate  (L) 

Postcentral  (L) 

Pre-central  (L) 

Postcentral  (L) 

Precuneus  (L) 

Postcentral  (L) 

Superior  parietal  (L) 

Postcentral  (L) 

Superior  temporal  (L) 

Postcentral  (L) 

Precuneus  (R) 

Postcentral  (L) 

Caudal  anterior  cingulate  (R) 

Posterior  cingulate  (L) 

Corpus  callosum  (R) 

Posterior  cingulate  (L) 

Posterior  cingulate  (R) 

Posterior  cingulate  (L) 

Transverse  temporal  (L) 

Pre-central  (L) 

Superior  parietal  (R) 

Pre-central  (L) 

Cuneus  (L) 

Precuneus  (L) 

Lingual  (L) 

Precuneus  (L) 

Paracentral  (L) 

Precuneus  (L) 

Postcentral  (L) 

Precuneus  (L) 

Caudal  middle  frontal  (R) 

Precuneus  (L) 

Corpus  callosum  (R) 

Precuneus  (L) 

Fusiform  (R) 

Precuneus  (L) 

Isthmus  of  the  cingulate  (R) 

Precuneus  (L) 

Posterior  cingulate  (R) 

Precuneus  (L) 

Caudal  anterior  cingulate  (L) 

Rostral  anterior  cingulate  (L) 

Inferior  temporal  (L) 

Rostral  anterior  cingulate  (L) 

Parahippocampal  (L) 

Rostral  anterior  cingulate  (L) 

Pars  orbitalis  (L) 

Rostral  anterior  cingulate  (L) 

Rostral  middle  frontal  (L) 

Rostral  anterior  cingulate  (L) 

Region  1 

Region  2 

Middle  temporal  (R) 

Parahippocampal  (R) 

Temporal  pole  (R) 

Parahippocampal  (R) 

Corpus  callosum  (L) 

Paracentral  (R) 

Pre-central  (R) 

Paracentral  (R) 

Transverse  temporal  (R) 

Paracentral  (R) 

Insula  (R) 

Paracentral  (R) 

Superior  frontal  (L) 

Pars  opercularis  (R) 

Pars  orbitalis  (R) 

Pars  opercularis  (R) 

Pre-central  (R) 

Pars  opercularis  (R) 

Insula  (R) 

Pars  opercularis  (R) 

Rostral  anterior  cingulate  (L) 

Pars  orbitalis  (R) 

Superior  frontal  (L) 

Pars  orbitalis  (R) 

Superior  frontal  (R) 

Pars  orbitalis  (R) 

Rostral  anterior  cingulate  (L) 

Pars  triangularis  (R) 

Entorhinal  (R) 

Pars  triangularis  (R) 

Inferior  parietal  (R) 

Pars  triangularis  (R) 

Medial  orbitofrontal  (R) 

Pars  triangularis  (R) 

Supra-marginal  (R) 

Pars  triangularis  (R) 

Pars  opercularis  (R) 

Postcentral  (R) 

Caudal  anterior  cingulate  (L) 

Posterior  cingulate  (R) 

Corpus  callosum  (L) 

Posterior  cingulate  (R) 

Isthmus  of  the  cingulate  (L) 

Posterior  cingulate  (R) 

Corpus  callosum  (R) 

Posterior  cingulate  (R) 

Isthmus  of  the  cingulate  (R) 

Posterior  cingulate  (R) 

Lateral  orbitofrontal  (R) 

Posterior  cingulate  (R) 

Lingual  (R) 

Posterior  cingulate  (R) 

Insula  (R) 

Posterior  cingulate  (R) 

Pars  triangularis  (R) 

Pre-central  (R) 

Insula  (R) 

Pre-central  (R) 

Inferior  parietal  (L) 

Precuneus  (R) 

Inferior  temporal  (L) 

Precuneus  (R) 

Postcentral  (L) 

Precuneus  (R) 

Lateral  occipital  (R) 

Precuneus  (R) 

Lingual  (R) 

Precuneus  (R) 

Paracentral  (R) 

Precuneus  (R) 

Pre-central  (R) 

Precuneus  (R) 

Corpus  callosum  (L) 

Rostral  anterior  cingulate  (R) 

Frontal  pole  (R) 

Rostral  anterior  cingulate  (R) 

Caudal  middle  frontal  (L) 

Rostral  middle  frontal  (R) 

Rostral  anterior  cingulate  (L) 

Rostral  middle  frontal  (R) 

Caudal  anterior  cingulate  (R) 

Rostral  middle  frontal  (R) 

Superior  frontal  (R) 

Rostral  middle  frontal  (R) 

Medial  orbitofrontal  (L) 

Superior  frontal  (R) 

Postcentral  (L) 

Superior  frontal  (R) 

Medial  orbitofrontal  (R) 

Superior  frontal  (R) 

Paracentral  (R) 

Superior  frontal  (R) 

Pars  opercularis  (R) 

Superior  frontal  (R) 

Rostral  middle  frontal  (R) 

Superior  frontal  (R) 

Corpus  callosum  (L) 

Superior  parietal  (R) 

Isthmus  of  the  cingulate  (L) 

Superior  parietal  (R) 
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Table  8  —  continued  from  previous  page 


Region  1 

Region  2 

Isthmus  of  the  cingulate  (R) 

Superior  parietal  (R) 

Pre-central  (R) 

Superior  parietal  (R) 

Lateral  occipital  (R) 

Superior  temporal  (R) 

Parahippocampal  (R) 

Superior  temporal  (R) 

Superior  frontal  (R) 

Superior  temporal  (R) 

Supra-marginal  (R) 

Superior  temporal  (R) 

Transverse  temporal  (R) 

Superior  temporal  (R) 

Corpus  callosum  (L) 

Supra-marginal  (R) 

Inferior  temporal  (R) 

Supra-marginal  (R) 

Pars  triangularis  (R) 

Supra-marginal  (R) 

Postcentral  (R) 

Supra-marginal  (R) 

Precuneus  (R) 

Supra-marginal  (R) 

Rostral  anterior  cingulate  (L) 

Frontal  pole  (R) 

Medial  orbitofrontal  (R) 

Frontal  pole  (R) 

Rostral  anterior  cingulate  (R) 

Frontal  pole  (R) 

Frontal  pole  (R) 

Frontal  pole  (R) 

Noncortical  (R) 

Temporal  pole  (R) 

Inferior  parietal  (R) 

Temporal  pole  (R) 

Temporal  pole  (R) 

Temporal  pole  (R) 

Lingual  (R) 

Transverse  temporal  (R) 

Middle  temporal  (R) 

Transverse  temporal  (R) 

Superior  temporal  (R) 

Transverse  temporal  (R) 

Transverse  temporal  (R) 

Transverse  temporal  (R) 

Corpus  callosum  (L) 

Insula  (R) 

Rostral  anterior  cingulate  (L) 

Insula  (R) 

Entorhinal  (R) 

Insula  (R) 

Lingual  (R) 

Insula  (R) 

Medial  orbitofrontal  (R) 

Insula  (R) 

Parahippocampal  (R) 

Insula  (R) 

Frontal  pole  (R) 

Insula  (R) 

Region  1 

Region  2 

Frontal  pole  (L) 

Rostral  anterior  cingulate  (L) 

Caudal  anterior  cingulate  (R) 

Rostral  anterior  cingulate  (L) 

Corpus  callosum  (R) 

Rostral  anterior  cingulate  (L) 

Corpus  callosum  (L) 

Rostral  middle  frontal  (L) 

Rostral  anterior  cingulate  (L) 

Rostral  middle  frontal  (L) 

Supra-marginal  (L) 

Rostral  middle  frontal  (L) 

Frontal  pole  (L) 

Rostral  middle  frontal  (L) 

Insula  (L) 

Rostral  middle  frontal  (L) 

Caudal  anterior  cingulate  (R) 

Rostral  middle  frontal  (L) 

Medial  orbitofrontal  (L) 

Superior  frontal  (L) 

Middle  temporal  (L) 

Superior  frontal  (L) 

Rostral  middle  frontal  (L) 

Superior  frontal  (L) 

Noncortical  (L) 

Superior  parietal  (L) 

Cuneus  (L) 

Superior  parietal  (L) 

Isthmus  of  the  cingulate  (L) 

Superior  parietal  (L) 

Transverse  temporal  (L) 

Superior  parietal  (L) 

Superior  parietal  (R) 

Superior  parietal  (L) 

Noncortical  (L) 

Superior  temporal  (L) 

Cuneus  (L) 

Superior  temporal  (L) 

Entorhinal  (L) 

Superior  temporal  (L) 

Inferior  parietal  (L) 

Superior  temporal  (L) 

Transverse  temporal  (L) 

Superior  temporal  (L) 

Isthmus  of  the  cingulate  (L) 

Supra-marginal  (L) 

Peri-calcarine  (L) 

Supra-marginal  (L) 

Rostral  middle  frontal  (L) 

Supra-marginal  (L) 

Insula  (L) 

Supra-marginal  (L) 

Caudal  anterior  cingulate  (L) 

Insula  (L) 

Isthmus  of  the  cingulate  (L) 

Insula  (L) 

Transverse  temporal  (L) 

Insula  (L) 

Caudal  anterior  cingulate  (R) 

Insula  (L) 
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Table  9:  Sex  confusion  matrices  obtained  with  the  generalized  communicability  topological 


metric. 


All  features 

Women 

Men 

Women 

112 

81 

Men 

81 

29 

FDR  selected  features 

Women 

Men 

Women 

106 

87 

Men 

34 

76 

Feature  selection 

Women 

Men 

Women 

180.9 

12.1 

Men 

11.5 

98.5 
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Table  10:  Kinship  confusion  matrices  with  the  edge  betweenness  centrality  topological 
metric. 


All  features 

Identical  Twins 

Non-identical  Multiples 

Siblings 

Unrelated 

Identical  Twins 

11 

21 

9 

9 

Non-identical  Multiples 

18 

27 

10 

12 

Siblings 

11 

11 

9 

4 

Unrelated 

2 

1 

0 

97 

FDR  selected  features 

Identical  Twins 

Non-identical  Multiples 

Siblings 

Unrelated 

Identical  Twins 

8 

12 

7 

23 

Non-identical  Multiples 

8 

21 

5 

33 

Siblings 

3 

14 

4 

14 

Unrelated 

18 

26 

8 

48 

Feature  selection 

Identical  Twins 

Non-identical  Multiples 

Siblings 

Unrelated 

Identical  Twins 

38.2 

8.1 

2.4 

1.3 

Non-identical  Multiples 

3.7 

58.1 

2.9 

2.3 

Siblings 

2.2 

5.7 

24.8 

2.3 

Unrelated 

0.2 

1 

0 

98.2 
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Table  11:  Communicability  best  features  for  sex  classification. 


Region  1 

Region  2 

Caudal  anterior  cingulate  (L) 

Noncortical  (L) 

Corpus  callosum  (L) 

Noncortical  (L) 

Inferior  temporal  (L) 

Noncortical  (L) 

Isthmus  of  the  cingulate  (L) 

Noncortical  (L) 

Lingual  (L) 

Noncortical  (L) 

Pars  opercularis  (L) 

Noncortical  (L) 

Posterior  cingulate  (L) 

Noncortical  (L) 

Superior  temporal  (L) 

Noncortical  (L) 

Transverse  temporal  (L) 

Noncortical  (L) 

Insula  (L) 

Noncortical  (L) 

Noncortical  (L) 

Caudal  anterior  cingulate  (L) 

Corpus  callosum  (R) 

Caudal  anterior  cingulate  (L) 

Corpus  callosum  (R) 

Caudal  middle  frontal  (L) 

Inferior  temporal  (L) 

Corpus  callosum  (L) 

Posterior  cingulate  (L) 

Corpus  callosum  (L) 

Caudal  anterior  cingulate  (L) 

Cuneus  (L) 

Superior  temporal  (L) 

Cuneus  (L) 

Parahippocampal  (L) 

Entorhinal  (L) 

Medial  orbitofrontal  (L) 

Fusiform  (L) 

Entorhinal  (L) 

Inferior  parietal  (L) 

Fusiform  (L) 

Inferior  parietal  (L) 

Inferior  temporal  (L) 

Inferior  parietal  (L) 

Rostral  anterior  cingulate  (L) 

Inferior  parietal  (L) 

Inferior  parietal  (L) 

Inferior  temporal  (L) 

Inferior  temporal  (L) 

Inferior  temporal  (L) 

Medial  orbitofrontal  (L) 

Inferior  temporal  (L) 

Frontal  pole  (L) 

Inferior  temporal  (L) 

Caudal  anterior  cingulate  (L) 

Isthmus  of  the  cingulate  (L) 

Caudal  middle  frontal  (L) 

Isthmus  of  the  cingulate  (L) 

Precuneus  (L) 

Isthmus  of  the  cingulate  (L) 

Corpus  callosum  (R) 

Isthmus  of  the  cingulate  (L) 

Cuneus  (R) 

Isthmus  of  the  cingulate  (L) 

Isthmus  of  the  cingulate  (R) 

Isthmus  of  the  cingulate  (L) 

Corpus  callosum  (L) 

Lateral  occipital  (L) 

Middle  temporal  (L) 

Lateral  occipital  (L) 

Superior  parietal  (L) 

Lateral  occipital  (L) 

Superior  temporal  (L) 

Lateral  occipital  (L) 

Frontal  pole  (L) 

Lateral  occipital  (L) 

Noncortical  (R) 

Lateral  occipital  (L) 

Cuneus  (R) 

Lateral  occipital  (L) 

Peri-calcarine  (R) 

Lateral  occipital  (L) 

Superior  temporal  (R) 

Lateral  occipital  (L) 

Superior  parietal  (L) 

Lateral  orbitofrontal  (L) 

Frontal  pole  (L) 

Lateral  orbitofrontal  (L) 

Noncortical  (L) 

Lingual  (L) 

Cuneus  (L) 

Lingual  (L) 

Noncortical  (R) 

Lingual  (L) 

Corpus  callosum  (L) 

Medial  orbitofrontal  (L) 

Inferior  temporal  (L) 

Medial  orbitofrontal  (L) 

Region  1 

Region  2 

Temporal  pole  (L) 

Temporal  pole  (L) 

Fusiform  (L) 

Transverse  temporal  (L) 

Lingual  (L) 

Transverse  temporal  (L) 

Middle  temporal  (L) 

Transverse  temporal  (L) 

Supra-marginal  (R) 

Transverse  temporal  (L) 

Noncortical  (L) 

Insula  (L) 

Pars  opercularis  (L) 

Insula  (L) 

Superior  parietal  (L) 

Insula  (L) 

Temporal  pole  (L) 

Insula  (L) 

Fusiform  (L) 

Noncortical  (R) 

Lateral  occipital  (L) 

Noncortical  (R) 

Lingual  (L) 

Noncortical  (R) 

Inferior  parietal  (R) 

Noncortical  (R) 

Paracentral  (R) 

Noncortical  (R) 

Entorhinal  (R) 

Caudal  anterior  cingulate  (R) 

Peri-calcarine  (R) 

Caudal  anterior  cingulate  (R) 

Temporal  pole  (R) 

Caudal  anterior  cingulate  (R) 

Frontal  pole  (L) 

Corpus  callosum  (R) 

Caudal  anterior  cingulate  (R) 

Corpus  callosum  (R) 

Insula  (R) 

Corpus  callosum  (R) 

Entorhinal  (L) 

Cuneus  (R) 

Lateral  occipital  (L) 

Cuneus  (R) 

Paracentral  (L) 

Cuneus  (R) 

Caudal  anterior  cingulate  (R) 

Cuneus  (R) 

Isthmus  of  the  cingulate  (R) 

Cuneus  (R) 

Supra-marginal  (R) 

Cuneus  (R) 

Cuneus  (L) 

Entorhinal  (R) 

Inferior  parietal  (R) 

Fusiform  (R) 

Isthmus  of  the  cingulate  (R) 

Fusiform  (R) 

Parahippocampal  (R) 

Fusiform  (R) 

Supra-marginal  (R) 

Fusiform  (R) 

Noncortical  (R) 

Inferior  parietal  (R) 

Lateral  occipital  (L) 

Inferior  temporal  (R) 

Pars  orbitalis  (R) 

Inferior  temporal  (R) 

Transverse  temporal  (L) 

Isthmus  of  the  cingulate  (R) 

Caudal  anterior  cingulate  (R) 

Isthmus  of  the  cingulate  (R) 

Corpus  callosum  (R) 

Isthmus  of  the  cingulate  (R) 

Cuneus  (R) 

Isthmus  of  the  cingulate  (R) 

Middle  temporal  (R) 

Isthmus  of  the  cingulate  (R) 

Superior  frontal  (R) 

Isthmus  of  the  cingulate  (R) 

Temporal  pole  (R) 

Isthmus  of  the  cingulate  (R) 

Lateral  occipital  (L) 

Lateral  occipital  (R) 

Pars  triangularis  (R) 

Lateral  occipital  (R) 

Peri-calcarine  (R) 

Lateral  occipital  (R) 

Superior  parietal  (R) 

Lateral  occipital  (R) 

Caudal  anterior  cingulate  (L) 

Lateral  orbitofrontal  (R) 

Medial  orbitofrontal  (L) 

Lateral  orbitofrontal  (R) 

Caudal  middle  frontal  (R) 

Lateral  orbitofrontal  (R) 

Corpus  callosum  (R) 

Lateral  orbitofrontal  (R) 
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Table  11  —  continued  from  previous  page 


Region  1 

Region  2 

Pars  opercularis  (L) 

Medial  orbitofrontal  (L) 

Transverse  temporal  (R) 

Medial  orbitofrontal  (L) 

Inferior  parietal  (L) 

Middle  temporal  (L) 

Transverse  temporal  (L) 

Middle  temporal  (L) 

Entorhinal  (L) 

Parahippocampal  (L) 

Inferior  parietal  (L) 

Parahippocampal  (L) 

Superior  temporal  (L) 

Parahippocampal  (L) 

Temporal  pole  (L) 

Parahippocampal  (L) 

Transverse  temporal  (L) 

Parahippocampal  (L) 

Caudal  anterior  cingulate  (R) 

Parahippocampal  (L) 

Medial  orbitofrontal  (R) 

Parahippocampal  (L) 

Frontal  pole  (R) 

Parahippocampal  (L) 

Noncortical  (L) 

Paracentral  (L) 

Inferior  parietal  (L) 

Paracentral  (L) 

Lateral  occipital  (L) 

Paracentral  (L) 

Medial  orbitofrontal  (L) 

Paracentral  (L) 

Peri-calcarine  (L) 

Paracentral  (L) 

Postcentral  (L) 

Paracentral  (L) 

Posterior  cingulate  (L) 

Paracentral  (L) 

Superior  parietal  (L) 

Paracentral  (L) 

Fusiform  (R) 

Paracentral  (L) 

Inferior  parietal  (R) 

Paracentral  (L) 

Inferior  temporal  (R) 

Paracentral  (L) 

Isthmus  of  the  cingulate  (R) 

Paracentral  (L) 

Medial  orbitofrontal  (R) 

Paracentral  (L) 

Paracentral  (R) 

Paracentral  (L) 

Precuneus  (R) 

Paracentral  (L) 

Fusiform  (L) 

Pars  opercularis  (L) 

Medial  orbitofrontal  (L) 

Pars  opercularis  (L) 

Postcentral  (L) 

Pars  opercularis  (L) 

Caudal  anterior  cingulate  (R) 

Pars  opercularis  (L) 

Caudal  middle  frontal  (R) 

Pars  opercularis  (L) 

Entorhinal  (R) 

Pars  opercularis  (L) 

Fusiform  (L) 

Pars  orbitalis  (L) 

Pars  triangularis  (L) 

Pars  orbitalis  (L) 

Noncortical  (L) 

Pars  triangularis  (L) 

Rostral  middle  frontal  (L) 

Pars  triangularis  (L) 

Posterior  cingulate  (R) 

Pars  triangularis  (L) 

Rostral  anterior  cingulate  (R) 

Pars  triangularis  (L) 

Cuneus  (L) 

Peri-calcarine  (L) 

Pars  triangularis  (L) 

Peri-calcarine  (L) 

Rostral  middle  frontal  (L) 

Peri-calcarine  (L) 

Transverse  temporal  (L) 

Peri-calcarine  (L) 

Caudal  anterior  cingulate  (R) 

Peri-calcarine  (L) 

Lingual  (R) 

Peri-calcarine  (L) 

Posterior  cingulate  (R) 

Peri-calcarine  (L) 

Precuneus  (R) 

Peri-calcarine  (L) 

Noncortical  (L) 

Postcentral  (L) 

Inferior  parietal  (L) 

Postcentral  (L) 

Lateral  orbitofrontal  (L) 

Postcentral  (L) 

Region  1 

Region  2 

Inferior  parietal  (R) 

Lateral  orbitofrontal  (R) 

Lateral  orbitofrontal  (L) 

Lingual  (R) 

Medial  orbitofrontal  (R) 

Lingual  (R) 

Paracentral  (R) 

Lingual  (R) 

Paracentral  (L) 

Medial  orbitofrontal  (R) 

Superior  parietal  (L) 

Medial  orbitofrontal  (R) 

Caudal  anterior  cingulate  (R) 

Medial  orbitofrontal  (R) 

Paracentral  (R) 

Medial  orbitofrontal  (R) 

Posterior  cingulate  (R) 

Medial  orbitofrontal  (R) 

Isthmus  of  the  cingulate  (L) 

Middle  temporal  (R) 

Precuneus  (L) 

Middle  temporal  (R) 

Isthmus  of  the  cingulate  (R) 

Middle  temporal  (R) 

Superior  temporal  (R) 

Middle  temporal  (R) 

Noncortical  (L) 

Parahippocampal  (R) 

Peri-calcarine  (L) 

Parahippocampal  (R) 

Supra-marginal  (L) 

Parahippocampal  (R) 

Parahippocampal  (L) 

Paracentral  (R) 

Posterior  cingulate  (R) 

Paracentral  (R) 

Frontal  pole  (L) 

Pars  opercularis  (R) 

Rostral  middle  frontal  (R) 

Pars  opercularis  (R) 

Insula  (R) 

Pars  opercularis  (R) 

Corpus  callosum  (L) 

Pars  orbitalis  (R) 

Paracentral  (L) 

Pars  orbitalis  (R) 

Corpus  callosum  (R) 

Pars  orbitalis  (R) 

Pars  orbitalis  (R) 

Pars  orbitalis  (R) 

Rostral  middle  frontal  (R) 

Pars  orbitalis  (R) 

Isthmus  of  the  cingulate  (L) 

Pars  triangularis  (R) 

Fusiform  (R) 

Pars  triangularis  (R) 

Corpus  callosum  (L) 

Peri-calcarine  (R) 

Lateral  occipital  (L) 

Peri-calcarine  (R) 

Lingual  (L) 

Peri-calcarine  (R) 

Peri-calcarine  (L) 

Peri-calcarine  (R) 

Caudal  anterior  cingulate  (R) 

Peri-calcarine  (R) 

Corpus  callosum  (R) 

Peri-calcarine  (R) 

Superior  parietal  (R) 

Peri-calcarine  (R) 

Paracentral  (R) 

Postcentral  (R) 

Precuneus  (R) 

Postcentral  (R) 

Superior  parietal  (R) 

Postcentral  (R) 

Frontal  pole  (R) 

Postcentral  (R) 

Pre-central  (L) 

Posterior  cingulate  (R) 

Isthmus  of  the  cingulate  (R) 

Pre-central  (R) 

Pars  opercularis  (R) 

Pre-central  (R) 

Temporal  pole  (R) 

Pre-central  (R) 

Pars  orbitalis  (L) 

Precuneus  (R) 

Rostral  middle  frontal  (L) 

Precuneus  (R) 

Transverse  temporal  (L) 

Precuneus  (R) 

Cuneus  (R) 

Precuneus  (R) 

Fusiform  (R) 

Precuneus  (R) 

Inferior  temporal  (R) 

Precuneus  (R) 

Pars  triangularis  (L) 

Rostral  anterior  cingulate  (R) 
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Table  11  —  continued  from  previous  page 


Region  1 

Region  2 

Paracentral  (L) 

Postcentral  (L) 

Pars  opercularis  (L) 

Postcentral  (L) 

Temporal  pole  (L) 

Postcentral  (L) 

Caudal  anterior  cingulate  (L) 

Pre-central  (L) 

Medial  orbitofrontal  (L) 

Pre-central  (L) 

Parahippocampal  (L) 

Pre-central  (L) 

Supra-marginal  (L) 

Pre-central  (L) 

Caudal  anterior  cingulate  (R) 

Pre-central  (L) 

Lateral  occipital  (R) 

Pre-central  (L) 

Medial  orbitofrontal  (R) 

Pre-central  (L) 

Superior  frontal  (R) 

Pre-central  (L) 

Superior  parietal  (R) 

Pre-central  (L) 

Fusiform  (L) 

Precuneus  (L) 

Posterior  cingulate  (L) 

Precuneus  (L) 

Superior  frontal  (L) 

Precuneus  (L) 

Insula  (L) 

Precuneus  (L) 

Temporal  pole  (R) 

Precuneus  (L) 

Inferior  parietal  (L) 

Rostral  anterior  cingulate  (L) 

Lateral  orbitofrontal  (L) 

Rostral  anterior  cingulate  (L) 

Pars  orbitalis  (L) 

Rostral  anterior  cingulate  (L) 

Superior  parietal  (L) 

Rostral  anterior  cingulate  (L) 

Superior  temporal  (L) 

Rostral  anterior  cingulate  (L) 

Caudal  middle  frontal  (L) 

Rostral  middle  frontal  (L) 

Pars  orbitalis  (L) 

Rostral  middle  frontal  (L) 

Postcentral  (L) 

Rostral  middle  frontal  (L) 

Superior  parietal  (L) 

Rostral  middle  frontal  (L) 

Isthmus  of  the  cingulate  (R) 

Rostral  middle  frontal  (L) 

Paracentral  (L) 

Superior  frontal  (L) 

Pre-central  (L) 

Superior  frontal  (L) 

Medial  orbitofrontal  (R) 

Superior  frontal  (L) 

Lateral  occipital  (L) 

Superior  parietal  (L) 

Supra-marginal  (L) 

Superior  parietal  (L) 

Insula  (L) 

Superior  parietal  (L) 

Isthmus  of  the  cingulate  (R) 

Superior  parietal  (L) 

Rostral  anterior  cingulate  (R) 

Superior  parietal  (L) 

Pars  triangularis  (L) 

Superior  temporal  (L) 

Supra-marginal  (L) 

Superior  temporal  (L) 

Temporal  pole  (L) 

Superior  temporal  (L) 

Corpus  callosum  (L) 

Supra-marginal  (L) 

Lateral  orbitofrontal  (L) 

Supra-marginal  (L) 

Pars  orbitalis  (L) 

Supra-marginal  (L) 

Posterior  cingulate  (L) 

Supra-marginal  (L) 

Pre-central  (L) 

Supra-marginal  (L) 

Superior  frontal  (L) 

Supra-marginal  (L) 

Superior  parietal  (L) 

Supra-marginal  (L) 

Precuneus  (R) 

Supra-marginal  (L) 

Lateral  orbitofrontal  (L) 

Frontal  pole  (L) 

Superior  parietal  (L) 

Frontal  pole  (L) 

Caudal  anterior  cingulate  (R) 

Frontal  pole  (L) 

Pars  triangularis  (R) 

Frontal  pole  (L) 

Region  1 

Region  2 

Caudal  anterior  cingulate  (L) 

Rostral  middle  frontal  (R) 

Rostral  anterior  cingulate  (L) 

Rostral  middle  frontal  (R) 

Caudal  anterior  cingulate  (R) 

Rostral  middle  frontal  (R) 

Pars  opercularis  (R) 

Rostral  middle  frontal  (R) 

Pars  orbitalis  (R) 

Rostral  middle  frontal  (R) 

Caudal  anterior  cingulate  (L) 

Superior  frontal  (R) 

Pars  orbitalis  (L) 

Superior  frontal  (R) 

Isthmus  of  the  cingulate  (R) 

Superior  frontal  (R) 

Paracentral  (R) 

Superior  frontal  (R) 

Pars  triangularis  (R) 

Superior  frontal  (R) 

Posterior  cingulate  (R) 

Superior  frontal  (R) 

Frontal  pole  (R) 

Superior  frontal  (R) 

Insula  (R) 

Superior  frontal  (R) 

Caudal  anterior  cingulate  (L) 

Superior  parietal  (R) 

Transverse  temporal  (L) 

Superior  parietal  (R) 

Caudal  anterior  cingulate  (R) 

Superior  parietal  (R) 

Isthmus  of  the  cingulate  (R) 

Superior  parietal  (R) 

Superior  temporal  (R) 

Superior  parietal  (R) 

Inferior  parietal  (R) 

Superior  temporal  (R) 

Middle  temporal  (R) 

Superior  temporal  (R) 

Pars  triangularis  (R) 

Superior  temporal  (R) 

Peri-calcarine  (R) 

Superior  temporal  (R) 

Transverse  temporal  (R) 

Superior  temporal  (R) 

Superior  temporal  (L) 

Supra-marginal  (R) 

Transverse  temporal  (L) 

Supra-marginal  (R) 

Cuneus  (R) 

Supra-marginal  (R) 

Fusiform  (R) 

Supra-marginal  (R) 

Cuneus  (L) 

Frontal  pole  (R) 

Inferior  temporal  (L) 

Frontal  pole  (R) 

Parahippocampal  (L) 

Frontal  pole  (R) 

Pars  orbitalis  (L) 

Frontal  pole  (R) 

Peri-calcarine  (L) 

Frontal  pole  (R) 

Posterior  cingulate  (R) 

Frontal  pole  (R) 

Cuneus  (L) 

Temporal  pole  (R) 

Transverse  temporal  (L) 

Temporal  pole  (R) 

Isthmus  of  the  cingulate  (R) 

Temporal  pole  (R) 

Parahippocampal  (R) 

Temporal  pole  (R) 

Temporal  pole  (R) 

Temporal  pole  (R) 

Transverse  temporal  (R) 

Temporal  pole  (R) 

Medial  orbitofrontal  (L) 

Transverse  temporal  (R) 

Parahippocampal  (L) 

Transverse  temporal  (R) 

Pars  opercularis  (L) 

Transverse  temporal  (R) 

Pars  orbitalis  (L) 

Transverse  temporal  (R) 

Peri-calcarine  (R) 

Transverse  temporal  (R) 

Superior  temporal  (R) 

Transverse  temporal  (R) 

Noncortical  (L) 

Insula  (R) 

Corpus  callosum  (R) 

Insula  (R) 

Inferior  parietal  (R) 

Insula  (R) 

Parahippocampal  (R) 

Insula  (R) 

Pars  triangularis  (R) 

Insula  (R) 
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Table  11  —  continued  from  previous  page 


Region  1 

Region  2 

Insula  (R) 

Caudal  anterior  cingulate  (L) 

Frontal  pole  (L) 

Frontal  pole  (L) 
Temporal  pole  (L) 
Temporal  pole  (L) 

Region  1 

Region  2 

Superior  frontal  (R) 

Supra-marginal  (R) 

Insula  (R) 

Insula  (R) 
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Table  12:  Edge  betweenness  centrality  best  features  for  kinship  classification. 


Region  1 

Region  2 

Caudal  middle  frontal  (L) 

Noncortical  (L) 

Cuneus  (L) 

Noncortical  (L) 

Fusiform  (L) 

Noncortical  (L) 

Isthmus  of  the  cingulate  (L) 

Caudal  anterior  cingulate  (L) 

Pars  opercularis  (L) 

Caudal  anterior  cingulate  (L) 

Caudal  anterior  cingulate  (R) 

Caudal  anterior  cingulate  (L) 

Posterior  cingulate  (R) 

Caudal  anterior  cingulate  (L) 

Superior  frontal  (R) 

Caudal  anterior  cingulate  (L) 

Noncortical  (L) 

Caudal  middle  frontal  (L) 

Pars  opercularis  (L) 

Caudal  middle  frontal  (L) 

Supra-marginal  (L) 

Caudal  middle  frontal  (L) 

Lingual  (L) 

Corpus  callosum  (L) 

Posterior  cingulate  (L) 

Corpus  callosum  (L) 

Pre-central  (L) 

Corpus  callosum  (L) 

Rostral  anterior  cingulate  (L) 

Corpus  callosum  (L) 

Superior  parietal  (L) 

Corpus  callosum  (L) 

Supra-marginal  (L) 

Corpus  callosum  (L) 

Isthmus  of  the  cingulate  (R) 

Corpus  callosum  (L) 

Lateral  occipital  (R) 

Corpus  callosum  (L) 

Medial  orbitofrontal  (R) 

Corpus  callosum  (L) 

Pars  triangularis  (R) 

Corpus  callosum  (L) 

Peri-calcarine  (R) 

Corpus  callosum  (L) 

Superior  parietal  (L) 

Cuneus  (L) 

Cuneus  (R) 

Cuneus  (L) 

Middle  temporal  (L) 

Entorhinal  (L) 

Temporal  pole  (L) 

Entorhinal  (L) 

Insula  (L) 

Entorhinal  (L) 

Cuneus  (L) 

Fusiform  (L) 

Lateral  occipital  (L) 

Fusiform  (L) 

Middle  temporal  (L) 

Fusiform  (L) 

Precuneus  (L) 

Fusiform  (L) 

Transverse  temporal  (L) 

Fusiform  (L) 

Noncortical  (L) 

Inferior  parietal  (L) 

Pars  opercularis  (L) 

Inferior  parietal  (L) 

Fusiform  (L) 

Inferior  temporal  (L) 

Precuneus  (L) 

Inferior  temporal  (L) 

Superior  temporal  (L) 

Inferior  temporal  (L) 

Temporal  pole  (L) 

Inferior  temporal  (L) 

Transverse  temporal  (L) 

Inferior  temporal  (L) 

Entorhinal  (L) 

Isthmus  of  the  cingulate  (L) 

Lingual  (L) 

Isthmus  of  the  cingulate  (L) 

Middle  temporal  (L) 

Isthmus  of  the  cingulate  (L) 

Supra-marginal  (L) 

Isthmus  of  the  cingulate  (L) 

Cuneus  (R) 

Isthmus  of  the  cingulate  (L) 

Parahippocampal  (R) 

Isthmus  of  the  cingulate  (L) 

Peri-calcarine  (L) 

Lateral  occipital  (L) 

Superior  temporal  (L) 

Lateral  occipital  (L) 

Corpus  callosum  (L) 

Lateral  orbitofrontal  (L) 

Insula  (L) 

Lateral  orbitofrontal  (L) 

Region  1 

Region  2 

Rostral  middle  frontal  (R) 

Caudal  anterior  cingulate  (R) 

Superior  frontal  (R) 

Caudal  anterior  cingulate  (R) 

Caudal  anterior  cingulate  (L) 

Caudal  middle  frontal  (R) 

Noncortical  (R) 

Caudal  middle  frontal  (R) 

Pars  opercularis  (R) 

Caudal  middle  frontal  (R) 

Postcentral  (R) 

Caudal  middle  frontal  (R) 

Rostral  middle  frontal  (R) 

Caudal  middle  frontal  (R) 

Posterior  cingulate  (L) 

Corpus  callosum  (R) 

Precuneus  (L) 

Corpus  callosum  (R) 

Caudal  middle  frontal  (R) 

Corpus  callosum  (R) 

Isthmus  of  the  cingulate  (R) 

Corpus  callosum  (R) 

Medial  orbitofrontal  (R) 

Corpus  callosum  (R) 

Precuneus  (R) 

Corpus  callosum  (R) 

Cuneus  (L) 

Cuneus  (R) 

Lingual  (R) 

Cuneus  (R) 

Frontal  pole  (L) 

Entorhinal  (R) 

Inferior  temporal  (R) 

Entorhinal  (R) 

Entorhinal  (R) 

Fusiform  (R) 

Peri-calcarine  (R) 

Fusiform  (R) 

Precuneus  (R) 

Fusiform  (R) 

Fusiform  (R) 

Inferior  parietal  (R) 

Superior  parietal  (R) 

Inferior  parietal  (R) 

Supra-marginal  (R) 

Inferior  parietal  (R) 

Insula  (R) 

Inferior  parietal  (R) 

Fusiform  (R) 

Inferior  temporal  (R) 

Lateral  occipital  (R) 

Inferior  temporal  (R) 

Peri-calcarine  (R) 

Inferior  temporal  (R) 

Superior  temporal  (R) 

Inferior  temporal  (R) 

Temporal  pole  (R) 

Inferior  temporal  (R) 

Parahippocampal  (L) 

Isthmus  of  the  cingulate  (R) 

Posterior  cingulate  (L) 

Isthmus  of  the  cingulate  (R) 

Precuneus  (L) 

Isthmus  of  the  cingulate  (R) 

Caudal  anterior  cingulate  (R) 

Isthmus  of  the  cingulate  (R) 

Entorhinal  (R) 

Isthmus  of  the  cingulate  (R) 

Fusiform  (R) 

Isthmus  of  the  cingulate  (R) 

Paracentral  (R) 

Isthmus  of  the  cingulate  (R) 

Peri-calcarine  (R) 

Isthmus  of  the  cingulate  (R) 

Middle  temporal  (R) 

Lateral  occipital  (R) 

Supra-marginal  (R) 

Lateral  occipital  (R) 

Transverse  temporal  (R) 

Lateral  occipital  (R) 

Rostral  anterior  cingulate  (L) 

Lateral  orbitofrontal  (R) 

Caudal  anterior  cingulate  (R) 

Lateral  orbitofrontal  (R) 

Fusiform  (R) 

Lateral  orbitofrontal  (R) 

Rostral  middle  frontal  (R) 

Lateral  orbitofrontal  (R) 

Superior  temporal  (R) 

Lateral  orbitofrontal  (R) 

Insula  (R) 

Lateral  orbitofrontal  (R) 

Noncortical  (R) 

Lingual  (R) 

Entorhinal  (R) 

Lingual  (R) 

Posterior  cingulate  (R) 

Lingual  (R) 
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Table  12  —  continued  from  previous  page 


Region  1 

Region  2 

Noncortical  (L) 

Lingual  (L) 

Cuneus  (L) 

Lingual  (L) 

Fusiform  (L) 

Lingual  (L) 

Isthmus  of  the  cingulate  (L) 

Lingual  (L) 

Parahippocampal  (L) 

Lingual  (L) 

Peri-calcarine  (L) 

Lingual  (L) 

Precuneus  (L) 

Lingual  (L) 

Superior  parietal  (L) 

Lingual  (L) 

Isthmus  of  the  cingulate  (R) 

Lingual  (L) 

Pars  triangularis  (L) 

Medial  orbitofrontal  (L) 

Rostral  anterior  cingulate  (L) 

Medial  orbitofrontal  (L) 

Frontal  pole  (L) 

Medial  orbitofrontal  (L) 

Inferior  parietal  (L) 

Middle  temporal  (L) 

Lingual  (L) 

Middle  temporal  (L) 

Entorhinal  (L) 

Parahippocampal  (L) 

Fusiform  (L) 

Parahippocampal  (L) 

Precuneus  (L) 

Paracentral  (L) 

Superior  frontal  (L) 

Paracentral  (L) 

Supra-marginal  (L) 

Pars  opercularis  (L) 

Rostral  anterior  cingulate  (L) 

Pars  orbitalis  (L) 

Rostral  middle  frontal  (L) 

Pars  orbitalis  (L) 

Pars  opercularis  (L) 

Pars  triangularis  (L) 

Cuneus  (L) 

Peri-calcarine  (L) 

Pars  opercularis  (L) 

Postcentral  (L) 

Posterior  cingulate  (L) 

Postcentral  (L) 

Pre-central  (L) 

Postcentral  (L) 

Precuneus  (L) 

Postcentral  (L) 

Insula  (L) 

Postcentral  (L) 

Caudal  middle  frontal  (L) 

Posterior  cingulate  (L) 

Paracentral  (R) 

Posterior  cingulate  (L) 

Posterior  cingulate  (R) 

Posterior  cingulate  (L) 

Precuneus  (R) 

Posterior  cingulate  (L) 

Rostral  anterior  cingulate  (R) 

Posterior  cingulate  (L) 

Caudal  anterior  cingulate  (L) 

Pre-central  (L) 

Corpus  callosum  (L) 

Pre-central  (L) 

Posterior  cingulate  (L) 

Pre-central  (L) 

Supra-marginal  (L) 

Pre-central  (L) 

Insula  (L) 

Pre-central  (L) 

Corpus  callosum  (L) 

Precuneus  (L) 

Postcentral  (L) 

Precuneus  (L) 

Inferior  parietal  (R) 

Precuneus  (L) 

Posterior  cingulate  (R) 

Precuneus  (L) 

Superior  parietal  (R) 

Precuneus  (L) 

Frontal  pole  (L) 

Rostral  anterior  cingulate  (L) 

Lateral  orbitofrontal  (R) 

Rostral  anterior  cingulate  (L) 

Rostral  middle  frontal  (R) 

Rostral  anterior  cingulate  (L) 

Corpus  callosum  (L) 

Rostral  middle  frontal  (L) 

Lateral  orbitofrontal  (L) 

Rostral  middle  frontal  (L) 

Medial  orbitofrontal  (L) 

Rostral  middle  frontal  (L) 

Rostral  anterior  cingulate  (L) 

Rostral  middle  frontal  (L) 

Region  1 

Region  2 

Caudal  anterior  cingulate  (L) 

Medial  orbitofrontal  (R) 

Corpus  callosum  (L) 

Medial  orbitofrontal  (R) 

Medial  orbitofrontal  (L) 

Medial  orbitofrontal  (R) 

Corpus  callosum  (R) 

Medial  orbitofrontal  (R) 

Entorhinal  (R) 

Medial  orbitofrontal  (R) 

Pars  orbitalis  (R) 

Medial  orbitofrontal  (R) 

Superior  temporal  (R) 

Medial  orbitofrontal  (R) 

Insula  (R) 

Medial  orbitofrontal  (R) 

Noncortical  (R) 

Middle  temporal  (R) 

Inferior  parietal  (R) 

Middle  temporal  (R) 

Lateral  occipital  (R) 

Middle  temporal  (R) 

Lateral  orbitofrontal  (R) 

Middle  temporal  (R) 

Entorhinal  (R) 

Parahippocampal  (R) 

Fusiform  (R) 

Parahippocampal  (R) 

Lingual  (R) 

Parahippocampal  (R) 

Middle  temporal  (R) 

Parahippocampal  (R) 

Temporal  pole  (R) 

Parahippocampal  (R) 

Insula  (R) 

Parahippocampal  (R) 

Posterior  cingulate  (R) 

Paracentral  (R) 

Transverse  temporal  (R) 

Paracentral  (R) 

Pre-central  (R) 

Pars  opercularis  (R) 

Frontal  pole  (L) 

Pars  orbitalis  (R) 

Medial  orbitofrontal  (R) 

Pars  orbitalis  (R) 

Parahippocampal  (R) 

Pars  triangularis  (R) 

Entorhinal  (R) 

Peri-calcarine  (R) 

Lingual  (R) 

Peri-calcarine  (R) 

Paracentral  (L) 

Postcentral  (R) 

Parahippocampal  (R) 

Postcentral  (R) 

Superior  parietal  (R) 

Postcentral  (R) 

Transverse  temporal  (R) 

Postcentral  (R) 

Caudal  anterior  cingulate  (L) 

Posterior  cingulate  (R) 

Pre-central  (L) 

Posterior  cingulate  (R) 

Paracentral  (R) 

Posterior  cingulate  (R) 

Postcentral  (R) 

Posterior  cingulate  (R) 

Superior  frontal  (R) 

Posterior  cingulate  (R) 

Superior  parietal  (R) 

Posterior  cingulate  (R) 

Caudal  anterior  cingulate  (L) 

Pre-central  (R) 

Pars  opercularis  (R) 

Pre-central  (R) 

Pars  orbitalis  (R) 

Pre-central  (R) 

Pars  triangularis  (R) 

Pre-central  (R) 

Precuneus  (L) 

Precuneus  (R) 

Noncortical  (R) 

Precuneus  (R) 

Rostral  middle  frontal  (L) 

Rostral  anterior  cingulate  (R) 

Corpus  callosum  (R) 

Rostral  anterior  cingulate  (R) 

Lateral  orbitofrontal  (R) 

Rostral  anterior  cingulate  (R) 

Rostral  anterior  cingulate  (L) 

Rostral  middle  frontal  (R) 

Caudal  anterior  cingulate  (R) 

Rostral  middle  frontal  (R) 

Corpus  callosum  (R) 

Rostral  middle  frontal  (R) 

Pars  opercularis  (R) 

Rostral  middle  frontal  (R) 

Frontal  pole  (R) 

Rostral  middle  frontal  (R) 
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Table  12  —  continued  from  previous  page 


Region  1 

Region  2 

Caudal  anterior  cingulate  (L) 

Superior  frontal  (L) 

Temporal  pole  (L) 

Superior  frontal  (L) 

Caudal  anterior  cingulate  (R) 

Superior  frontal  (L) 

Pre-central  (R) 

Superior  frontal  (L) 

Noncortical  (L) 

Superior  parietal  (L) 

Cuneus  (L) 

Superior  parietal  (L) 

Lateral  occipital  (L) 

Superior  parietal  (L) 

Postcentral  (L) 

Superior  parietal  (L) 

Transverse  temporal  (L) 

Superior  parietal  (L) 

Corpus  callosum  (R) 

Superior  parietal  (L) 

Postcentral  (R) 

Superior  parietal  (L) 

Superior  parietal  (R) 

Superior  parietal  (L) 

Insula  (L) 

Superior  temporal  (L) 

Rostral  anterior  cingulate  (L) 

Frontal  pole  (L) 

Rostral  middle  frontal  (L) 

Frontal  pole  (L) 

Entorhinal  (L) 

Temporal  pole  (L) 

Superior  temporal  (L) 

Temporal  pole  (L) 

Caudal  middle  frontal  (L) 

Insula  (L) 

Inferior  parietal  (L) 

Insula  (L) 

Superior  frontal  (L) 

Insula  (L) 

Transverse  temporal  (L) 

Insula  (L) 

Lingual  (R) 

Noncortical  (R) 

Supra-marginal  (R) 

Noncortical  (R) 

Transverse  temporal  (R) 

Noncortical  (R) 

Caudal  anterior  cingulate  (L) 

Caudal  anterior  cingulate  (R) 

Pars  triangularis  (R) 

Caudal  anterior  cingulate  (R) 

Posterior  cingulate  (R) 

Caudal  anterior  cingulate  (R) 

Region  1 

Region  2 

Medial  orbitofrontal  (L) 

Superior  frontal  (R) 

Paracentral  (R) 

Superior  frontal  (R) 

Pars  triangularis  (R) 

Superior  frontal  (R) 

Corpus  callosum  (L) 

Superior  parietal  (R) 

Isthmus  of  the  cingulate  (L) 

Superior  parietal  (R) 

Superior  parietal  (L) 

Superior  parietal  (R) 

Caudal  middle  frontal  (R) 

Superior  parietal  (R) 

Precuneus  (R) 

Superior  parietal  (R) 

Supra-marginal  (R) 

Superior  parietal  (R) 

Transverse  temporal  (R) 

Superior  parietal  (R) 

Noncortical  (R) 

Superior  temporal  (R) 

Lateral  occipital  (R) 

Superior  temporal  (R) 

Transverse  temporal  (R) 

Superior  temporal  (R) 

Noncortical  (R) 

Supra-marginal  (R) 

Caudal  middle  frontal  (R) 

Supra-marginal  (R) 

Isthmus  of  the  cingulate  (R) 

Supra-marginal  (R) 

Pars  triangularis  (R) 

Supra-marginal  (R) 

Transverse  temporal  (R) 

Supra-marginal  (R) 

Lateral  orbitofrontal  (R) 

Frontal  pole  (R) 

Rostral  anterior  cingulate  (R) 

Frontal  pole  (R) 

Rostral  middle  frontal  (R) 

Frontal  pole  (R) 

Entorhinal  (R) 

Temporal  pole  (R) 

Fusiform  (R) 

Temporal  pole  (R) 

Parahippocampal  (R) 

Temporal  pole  (R) 

Superior  temporal  (R) 

Temporal  pole  (R) 

Pre-central  (R) 

Insula  (R) 

Temporal  pole  (R) 

Insula  (R) 
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Table  13:  Differences  in  the  probability  of  connection  (connectivity  matrix)  due  to  sex. 


Region  1 

Region  2 

Women 

Men 

p- value 

Frontal  pole  (L) 

Caudal  anterior  cingulate  (L) 

0.0032 

0 

2.10E-03 

Medial  orbitofrontal  (R) 

Caudal  anterior  cingulate  (L) 

0.0115 

0.0094 

2.66E-02 

Transverse  temporal  (L) 

Cuneus  (L) 

0.0011 

0 

1.94E-02 

Paracentral  (R) 

Isthmus  of  the  cingulate  (L) 

0.0073 

0.0053 

7.70E-03 

Cuneus  (R) 

Lateral  occipital  (L) 

0.0029 

0 

5.00E-04 

Noncortical  (L) 

Lingual  (L) 

0.0925 

0.0631 

1.20E-03 

Lateral  orbitofrontal  (L) 

Parahippocampal  (L) 

0.0031 

0.002 

1.93E-02 

Peri- calcarine  (L) 

Parahippocampal  (L) 

0.0055 

0.0037 

5.00E-03 

Posterior  cingulate  (L) 

Paracentral  (L) 

0.1544 

0.1383 

1.66E-02 

Postcentral  (L) 

Pars  opercularis  (L) 

0.0042 

0.0017 

4.00E-04 

Pars  opercularis  (L) 

Postcentral  (L) 

0.0044 

0.0022 

1.00E-04 

Caudal  anterior  cingulate  (R) 

Posterior  cingulate  (L) 

0.0321 

0.0232 

1.07E-02 

Precuneus  (L) 

Pre-central  (L) 

0.0087 

0.008 

1.82E-02 

Supra-marginal  (L) 

Superior  temporal  (L) 

0.0321 

0.0244 

2.80E-03 

Pre-central  (R) 

Noncortical  (R) 

0.003 

0.0017 

1.81E-02 

Inferior  parietal  (L) 

Corpus  callosum  (R) 

0.0022 

0.0009 

1.78E-02 

Noncortical  (R) 

Corpus  callosum  (R) 

0.0012 

0.0007 

1.61E-02 

Inferior  parietal  (R) 

Corpus  callosum  (R) 

0.0054 

0.0015 

5.90E-03 

Lingual  (R) 

Corpus  callosum  (R) 

0.0028 

0.0014 

1.70E-02 

Corpus  callosum  (R) 

Inferior  parietal  (R) 

0.0037 

0.0019 

3.30E-02 

Caudal  anterior  cingulate  (R) 

Isthmus  of  the  cingulate  (R) 

0.0209 

0.0154 

2.79E-02 

Caudal  anterior  cingulate  (R) 

Medial  orbitofrontal  (R) 

0.0164 

0.0097 

2.40E-03 

Pars  orbitalis  (R) 

Medial  orbitofrontal  (R) 

0.0378 

0.0238 

1.78E-02 

Middle  temporal  (R) 

Parahippocampal  (R) 

0.0043 

0.0027 

1.82E-02 

Insula  (R) 

Parahippocampal  (R) 

0.0046 

0.0031 

1.79E-02 

Rostral  middle  frontal  (R) 

Pars  opercularis  (R) 

0.0269 

0.0226 

1.04E-02 

Superior  frontal  (R) 

Pars  triangularis  (R) 

0.0054 

0.0037 

7.70E-03 

Precuneus  (R) 

Postcentral  (R) 

0.0079 

0.0056 

5.80E-03 

Pars  triangularis  (L) 

Rostral  anterior  cingulate  (R) 

0.0107 

0.0059 

2.75E-02 

Pre-central  (R) 

Rostral  middle  frontal  (R) 

0.0037 

0.0029 

6.10E-03 

Rostral  middle  frontal  (L) 

Superior  frontal  (R) 

0.0044 

0.0023 

2.90E-03 

Isthmus  of  the  cingulate  (R) 

Superior  frontal  (R) 

0.0077 

0.0056 

1.50E-02 

Superior  temporal  (R) 

Transverse  temporal  (R) 

0.0382 

0.0306 

2.21E-02 

Supra-marginal  (R) 

Transverse  temporal  (R) 

0.0171 

0.013 

9.50E-03 

Insula  (R) 

Transverse  temporal  (R) 

0.0118 

0.0112 

1.24E-02 

Pars  triangularis  (R) 

Insula  (R) 

0.1671 

0.1428 

5.80E-03 
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Table  14:  Sex  differences  via  the  topological  clustering  coefficient. 


Region 

Women 

Men 

p- value 

Caudal  anterior  cingulate  (L) 

0.0449 

0.0385 

6.0e-4 

Pars  orbitalis  (L) 

0.2715 

0.2143 

1.8e-3 

Rostral  anterior  cingulate  (L) 

0.0501 

0.0451 

1.7e-3 

Rostral  middle  frontal  (L) 

0.0628 

0.0572 

6.2e-3 

Cuneus  (R) 

0.1417 

0.1224 

5.0e-3 

Middle  temporal  (R) 

0.0783 

0.0729 

7.3e-3 

Table  15:  Sex  differences  via  the  topological  edge  betweenness  centrality  from  region  1  to 
region  2. 


Region  1 

Region  2 

Women 

Men 

p- value 

Medial  orbitofrontal  (R) 

Caudal  anterior  cingulate  (R) 

3.6796 

0.1343 

3.0e-4 

Non-cortical  (L) 

Lingual  (L) 

10.0475 

3.8471 

3.0e-4 

Lingual  (L) 

Parahippocampal  (L) 

9.5410 

2.9989 

4.0e-4 

Supra-marginal  (R) 

Peri-calcarine  (L) 

0.0470 

0.0003 

2e-4 

Precuneus  (R) 

Corpus  callosum  (R) 

2.6160 

0.4481 

3e-4 
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Table  16:  Sex  differences  via  the  topological  communicability  matrix  from  region  1  to 
region  2. 


Region  1 

Region  2 

Women 

Men 

p- value 

Lingual  (R) 

Fusiform  (L) 

0.000193818 

6.6184E-05 

0.00E+00 

Lingual  (R) 

Parahippocampal  (L) 

1.44873E-05 

1.18653E-06 

0.00E+00 

Frontal  pole  (R) 

Parahippocampal  (L) 

2.02031E-06 

3.4342E-08 

2.00E-04 

Transverse  temporal  (R) 

Parahippocampal  (L) 

3.76227E-07 

3.06979E-08 

1.00E-04 

Parahippocampal  (L) 

Pars  orbitalis  (L) 

6.93337E-06 

9.09265E-07 

2.00E-04 

Parahippocampal  (R) 

Rostral  middle  frontal  (L) 

6.00877E-06 

7.85872E-07 

2.00E-04 

Medial  orbitofrontal  (R) 

Superior  parietal  (L) 

6.58429E-05 

2.26133E-05 

3.00E-04 

Lateral  occipital  (L) 

Medial  orbitofrontal  (R) 

1.74974E-06 

3.49259E-07 

2.00E-04 

Middle  temporal  (L) 

Medial  orbitofrontal  (R) 

4.61992E-05 

7.96382E-08 

2.00E-04 

Superior  parietal  (L) 

Medial  orbitofrontal  (R) 

1.16508E-05 

3.44506E-06 

2.00E-04 

Superior  temporal  (L) 

Medial  orbitofrontal  (R) 

7.71885E-06 

3.86133E-07 

3.00E-04 

Inferior  parietal  (R) 

Transverse  temporal  (R) 

0.000685963 

0.000223199 

3.00E-04 
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Figure  1:  Brain  modularity  obtained  from  the  average  of  the  brain  connectivity  matrices, 
a)  level  I,  b)  level  II.  Different  colors  indicate  different  modules.  The  numbers  correspond 
to  the  cortical  regions  indicated  in  Table  1  (main  document),  and  their  localization  in  the 
figure  correspond  to  the  geometric  center  of  each  region  in  the  center  of  the  axial  plane. 
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Figure  2:  Motifs  of  size  three  (taken  from  Sporns,  O.,  Kotter,  R.,  2004.  Motifs  in  brain 
networks.  PLoS  Biol.  2,  e369. 
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(b) 


(c)  (d) 

Figure  3:  Levels  of  sparsity  (proportion  of  non-zeros)  of  the  mean  connectivity  matrix 
thresholded  at  different  values,  a)  No  thresholded,  level  of  sparsity  0.564,  b)  tlrresholded 
at  0.0125,  level  of  sparsity  0.151,  c)  thresholded  at  0.025,  level  of  sparsity  0.116,  and  c) 
thresholded  at  0.0375,  level  of  sparsity  0.095. 
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Figure  4:  Sex  differences  considering  global  topological 
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metrics. 


Figure  5:  Z-score  kinship  differences  considering  global  topological  metrics:  a)  Identical 
twins  vs  non-identical  multiples,  b)  identical  twins  vs  siblings,  c)  identical  twins  vs  unre¬ 
lated,  d)  non-identical  multiples  vs  siblings,  e)  non-identical  multiples  vs  unrelated,  and 
f)  siblings  vs  unrelated. 
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Selected  Results  Using  Diffusion  Tensor-Tractography 
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Figure  6:  Selected  features  on  the  connectivity  matrix  for  a)  sex  and  b)  kinship  classifica¬ 
tion.  Color  code  corresponds  to  the  score  given  by  the  feature  selection  algorithm. 


(b) 


Figure  7:  Z-score  sex  differences  from  a)  the  connectivity  matrix,  b)  the  communicability 
matrix.  The  color  map  indicates  where  the  probability  of  connection  is  higher  for  women 
(magenta)  or  for  men  (cyan). 
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Figure  8:  Z-score  kinship  differences  considering  the  communicability  eigenvalues:  a)  Iden¬ 
tical  twins  vs  non-identical  multiples,  b)  identical  twins  vs  siblings,  c)  identical  twins  vs 
unrelated,  d)  non-identical  multiples  vs  siblings,  e)  non-identical  multiples  vs  unrelated, 
and  f)  siblings  vs  unrelated. 
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